Lesson 5.3: Research Frontiers & Open Problems¶

Estimated time: 45 minutes

Learning Objectives¶

By the end of this lesson, you will be able to:

Identify current research directions in Vector Symbolic Architectures
Understand open problems and challenges in the field
Explore connections between VSA and modern deep learning
Recognize opportunities for neuromorphic hardware implementations
Find ways to contribute to VSAX and the VSA research community
Formulate your own research questions in VSA

Prerequisites¶

Completion of Modules 1-4 (foundational VSA knowledge)
Lessons 5.1-5.2 (VFA and custom encoders)
Interest in research and pushing boundaries

The State of VSA Research (2025)¶

Vector Symbolic Architectures have evolved from a niche area to a vibrant research field with applications across:

Cognitive Science: Modeling human memory and reasoning
Robotics: Efficient control and perception
Neuroscience: Understanding neural computation
Edge Computing: Lightweight AI for resource-constrained devices
Neuromorphic Hardware: Brain-inspired computing architectures

Recent momentum: - 📈 Papers at major conferences (NeurIPS, ICML, ICLR, CVPR) - 🏢 Industrial adoption (Intel, IBM, startups) - 🔬 Cross-disciplinary collaborations (neuroscience + ML + hardware)

Research Direction 1: Learning-Based VSA¶

The Problem¶

Traditional VSA uses random basis vectors:

memory.add("cat")  # Samples random vector

Limitations: - No semantic structure (random vectors don't capture meaning) - Cannot leverage pre-trained knowledge - Similarity depends on encoding, not inherent meaning

Research Question¶

Can we learn basis vectors that capture semantic structure?

Approaches¶

1. Pre-training from Data

Train basis vectors to preserve semantic relationships:

# Hypothetical learned basis
memory.add_learned("cat", embedding_from_bert("cat"))
memory.add_learned("dog", embedding_from_bert("dog"))

# Now cat and dog have inherent similarity!
sim = cosine_similarity(memory["cat"].vec, memory["dog"].vec)  # High!

Current work: - Mikolić et al. (2024): "Learning Hyperdimensional Representations from Data" - VSA + Word2Vec/BERT hybrid models

2. Meta-Learning VSA Operations

Learn optimal binding/bundling strategies for specific domains:

# Learn domain-specific binding operator
learned_bind = LearnedBindingOperator(domain="vision")
scene = learned_bind(object1, object2)  # Better than random binding?

Open problems: - How to integrate gradient-based learning with discrete VSA ops? - Can we learn in high dimensions (10,000+) efficiently? - What inductive biases preserve VSA properties?

Your Research Opportunity¶

Project idea: Train VSA basis vectors using contrastive learning
Questions: Do learned bases retain VSA's symbolic properties? How much data is needed?

Research Direction 2: VSA + Large Language Models¶

The Problem¶

LLMs are powerful but: - ❌ Opaque reasoning (black box) - ❌ Cannot perform symbolic operations (binding, unbinding) - ❌ Huge computational cost

VSA is efficient but: - ❌ Requires manual feature engineering - ❌ Cannot handle raw text well

Research Question¶

How can we combine LLM representations with VSA symbolic reasoning?

Approaches¶

1. LLM Embeddings → VSA Encoding

from transformers import AutoModel, AutoTokenizer

# Get LLM embedding
model_llm = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

text = "The cat sat on the mat"
tokens = tokenizer(text, return_tensors="pt")
embedding = model_llm(**tokens).last_hidden_state.mean(dim=1)  # [CLS] pooling

# Project to VSA space
vsa_hv = project_to_vsa(embedding, dim=2048)

# Now can perform symbolic operations
memory.add("sentence")
memory["sentence"] = model.rep_cls(vsa_hv)

2. VSA-Augmented LLM Reasoning

Use VSA as external symbolic memory for LLMs:

LLM generates text → Extract facts → Encode in VSA → Query VSA → Feed back to LLM

Current work: - Nunes et al. (2024): "Neuro-Symbolic Question Answering with HDC" - VSA for factual knowledge grounding in LLMs

3. Hybrid Architecture

Input → LLM encoder → VSA layer → Reasoning → VSA decoder → LLM decoder → Output
         (perception)   (symbolic)  (inference)  (symbolic)    (generation)

Open problems: - How to backpropagate through VSA operations? - Can VSA help with LLM hallucinations (via symbolic grounding)? - What's the optimal dimensionality for LLM-VSA hybrids?

Your Research Opportunity¶

Project idea: Build a VSA-augmented chatbot that can reason symbolically
Questions: Does VSA improve factual accuracy? Can it explain reasoning steps?

Research Direction 3: Neuromorphic Hardware¶

The Problem¶

Traditional von Neumann architectures are inefficient for VSA: - High-dimensional vectors (10,000 dims) → memory bottleneck - Binding/bundling are embarrassingly parallel → underutilized

Research Question¶

Can we build specialized hardware for VSA that's orders of magnitude more efficient?

Approaches¶

1. In-Memory Computing

Store hypervectors directly in analog memory:

Hypervector [0.3, -0.7, 0.5, ...] → Stored as resistances in ReRAM
Bundling → Parallel current summation (analog addition)
Binding → Element-wise multiply in crossbar array

Advantages: - ⚡ Extremely fast (nanosecond operations) - 🔋 Low power (no data movement) - 📦 Compact (analog storage)

Current work: - Karunaratne et al. (2020): "In-memory hyperdimensional computing" - IBM neuromorphic chips with HDC

2. Spiking Neural Networks (SNNs) + VSA

Encode hypervectors as spike patterns:

Hypervector component 0.7 → High firing rate neuron
Hypervector component -0.3 → Low firing rate neuron

Advantages: - 🧠 Biologically plausible - ⚡ Event-driven (sparse activity) - 🔋 Ultra-low power

Current work: - Mitrokhin et al. (2023): "Learning sensorimotor representations with spiking HDC" - Intel Loihi neuromorphic chip

3. Photonic Computing

Use light for hypervector operations:

Hypervector → Optical signal (wavelength/phase encoded)
Binding → Optical interference
Bundling → Beam combining

Advantages: - 🚀 Speed of light computation - ❄️ Minimal heat generation - 🔧 Massively parallel

Open problems: - How to handle negative values in analog hardware? - Can we train VSA models directly on neuromorphic chips? - What's the optimal hypervector representation for SNNs?

Your Research Opportunity¶

Project idea: Simulate VSA on spiking neural network simulator (e.g., Brian2, NEST)
Questions: What's the energy efficiency gain? Can SNNs learn VSA operations?

Research Direction 4: Dimensionality & Capacity¶

The Problem¶

Current VSA uses fixed dimensionality (typically 512-10,000): - Too low → poor capacity, high noise - Too high → wasteful computation and memory

Research Questions¶

What is the theoretical minimum dimensionality for a given task?
Can we adaptively adjust dimensionality during computation?
Are there compressive encodings that preserve VSA properties at lower dimensions?

Approaches¶

1. Dimensionality Theory

Johnson-Lindenstrauss Lemma tells us random projections preserve distances.

Can we derive similar guarantees for VSA operations?

Conjecture: For N symbols with ε error tolerance, minimum dimension:

d_min ≥ O(log(N) / ε²)

Open problems: - Exact bounds for binding/bundling capacity - Trade-offs between dimensionality, accuracy, and computational cost

2. Dynamic Dimensionality

class AdaptiveDimensionalityVSA:
    def __init__(self, dim_min=512, dim_max=10000):
        self.current_dim = dim_min

    def bind(self, a, b):
        # Monitor reconstruction error
        result = fhrr_bind(a, b, dim=self.current_dim)
        error = self.estimate_error(result)

        if error > threshold:
            # Increase dimensionality
            self.current_dim = min(self.current_dim * 2, self.dim_max)

        return result

3. Compression Techniques

Apply dimensionality reduction while preserving structure:

# Full VSA at 10,000 dims
hv_full = encode_full(data)

# Compress to 512 dims using learned projection
hv_compressed = learned_projection(hv_full, target_dim=512)

# Can we still unbind/bundle?

Your Research Opportunity¶

Project idea: Empirically measure capacity vs dimensionality for different VSA models
Questions: Is there a phase transition in performance? Can we predict required dimensionality from task complexity?

Research Direction 5: Continual Learning¶

The Problem¶

Traditional ML suffers from catastrophic forgetting: - Train on task A → good performance - Train on task B → task A performance collapses

VSA has natural advantages: - Bundling is order-invariant (can add facts any time) - High-dimensional space has room for many concepts - No destructive updates

Research Question¶

Can VSA enable perfect continual learning?

Approaches¶

1. Incremental Bundling

class ContinualMemory:
    def __init__(self, model):
        self.model = model
        self.knowledge = jnp.zeros(model.dim, dtype=jnp.complex64)

    def learn_fact(self, fact_hv):
        # Simply bundle new fact
        self.knowledge = self.model.opset.bundle(self.knowledge, fact_hv)
        # No forgetting!

    def recall(self, query_hv):
        return self.model.opset.bind(self.knowledge,
                                      self.model.opset.inverse(query_hv))

2. Hierarchical Consolidation

Organize memories in tree structure:

       Root (general knowledge)
      /    |    \
   Math  Science  History  ← Domains
    /\     /\      /\
   ...    ...     ...      ← Specific facts

3. Selective Forgetting

Unbundle old/irrelevant facts to make room:

# Remove outdated fact
outdated_fact_hv = memory["2020_covid_stats"]
knowledge = opset.unbundle(knowledge, outdated_fact_hv)

Open problems: - What's the capacity limit for bundled knowledge? - How to detect and correct interference? - Can we compress old knowledge (lossy but space-efficient)?

Your Research Opportunity¶

Project idea: Build continual learning benchmark for VSA
Questions: At what point does bundling degrade? Can we quantify forgetting rate?

Research Direction 6: Interpretability¶

The Problem¶

Deep learning is often a black box. VSA promises interpretability: - Unbind to extract components - Similarity to known concepts - Symbolic structure

But challenges remain: - High dimensions are hard to visualize - Approximate unbinding introduces noise - Resonator convergence may fail

Research Question¶

How can we make VSA reasoning fully transparent and interpretable?

Approaches¶

1. Visualization Tools

Dimensionality reduction for plotting:

from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Encode many concepts
concepts = ["cat", "dog", "car", "tree", "mountain"]
hvs = [memory[c].vec for c in concepts]

# Reduce to 2D for plotting
hvs_2d = TSNE(n_components=2).fit_transform(hvs)

plt.scatter(hvs_2d[:, 0], hvs_2d[:, 1])
for i, concept in enumerate(concepts):
    plt.annotate(concept, (hvs_2d[i, 0], hvs_2d[i, 1]))
plt.show()

2. Explanation Generation

def explain_hypervector(hv, memory, top_k=5):
    """
    Explain what a hypervector represents by finding
    most similar known concepts.
    """
    similarities = []
    for name, stored_hv in memory.items():
        sim = cosine_similarity(hv, stored_hv.vec)
        similarities.append((name, sim))

    similarities.sort(key=lambda x: x[1], reverse=True)

    print(f"This hypervector is most similar to:")
    for name, sim in similarities[:top_k]:
        print(f"  - {name}: {sim:.3f}")

3. Causal Tracing

Track which components contribute to final answer:

# Query: "What is the capital of France?"
query_hv = encode_query("capital of France")

# Unbind to get answer
answer_hv = opset.bind(knowledge_base, opset.inverse(query_hv))

# Which facts in knowledge_base contributed most?
for fact_name, fact_hv in facts.items():
    contribution = cosine_similarity(answer_hv, fact_hv)
    print(f"Fact '{fact_name}' contribution: {contribution:.3f}")

Open problems: - Can we visualize high-dimensional binding structures? - How to present VSA reasoning to non-experts? - Quantifying uncertainty in VSA predictions?

Your Research Opportunity¶

Project idea: Build interactive VSA debugger/visualizer
Questions: Can users understand VSA reasoning? What visualizations are most helpful?

Open Problems Summary¶

Theoretical¶

Capacity bounds: Exact theoretical limits for binding/bundling
Optimal dimensionality: Principled way to choose dimension for a task
Error propagation: How does noise accumulate in multi-hop reasoning?
Universality: What computations can/cannot be expressed in VSA?

Algorithmic¶

Learning: How to integrate gradient-based learning with VSA?
Scaling: Efficient VSA for millions of symbols?
Streaming data: Online learning with unbounded data?
Multi-modality: Principled fusion of heterogeneous data?

Applied¶

Benchmarks: Standard datasets for VSA evaluation?
Real-world deployment: Production-ready VSA systems?
Hybrid architectures: Best way to combine VSA + DNNs?
Energy efficiency: How much better is VSA than DNNs on edge devices?

Contributing to VSAX¶

How to Get Involved¶

1. Report Issues

Found a bug or have a feature request? - GitHub Issues: https://github.com/anthropics/vsax/issues - Provide minimal reproducible example - Describe expected vs actual behavior

2. Contribute Code

Fork the repository
Create feature branch: git checkout -b feature/my-encoder
Write tests for your code
Submit pull request with clear description

Example contributions: - New encoder for your domain - Performance optimization - Additional examples/tutorials - Bug fixes

3. Share Your Research

Publish papers using VSAX
Share your custom encoders on GitHub
Write blog posts about applications
Present at conferences

4. Improve Documentation

Fix typos or unclear explanations
Add examples
Expand API documentation
Translate to other languages

Formulating Your Research Question¶

Template¶

Given [DOMAIN/PROBLEM],
can we [VSA TECHNIQUE/APPROACH]
to achieve [GOAL/METRIC]
better than [BASELINE]?

Examples¶

Example 1: Robotics

Given multi-modal sensor streams from a robot,
can we use SensorFusionEncoder + SSP
to achieve real-time localization and mapping
better than traditional SLAM algorithms?

Example 2: NLP

Given large-scale knowledge graphs,
can we use hierarchical VSA encoding + resonators
to achieve one-shot question answering
better than fine-tuned LLMs?

Example 3: Neuroscience

Given fMRI brain activity data,
can we decode mental states using VSA encoders
to achieve interpretable cognitive state classification
better than black-box DNNs?

Your Turn¶

What research question excites YOU?

Write it down using the template:

Given _______________________,
can we _______________________
to achieve __________________
better than _________________?

Resources for Further Learning¶

Key Papers¶

Foundational: 1. Plate (1995): "Holographic Reduced Representations" 2. Kanerva (2009): "Hyperdimensional Computing" 3. Gayler (2003): "Vector Symbolic Architectures"

Recent Surveys: 1. Kleyko et al. (2022): "Vector Symbolic Architectures as a Computing Framework" (IEEE Proceedings) 2. Schlegel et al. (2022): "A Comparison of Vector Symbolic Architectures"

Applications: 1. Imani et al. (2019): "A framework for collaborative learning in secure high-dimensional space" (Edge ML) 2. Ge & Parhi (2021): "Classification using hyperdimensional computing" (Efficient AI)

Conferences & Workshops¶

NeurIPS: Neuro-symbolic AI workshops
ICML: Efficient ML track
NICE: Neuro-Inspired Computational Elements workshop
CoRL: Conference on Robot Learning (VSA for control)

Online Communities¶

VSAX GitHub Discussions: Ask questions, share projects
Reddit r/MachineLearning: HDC/VSA threads
Twitter #VectorSymbolicArchitectures

Self-Assessment¶

Before concluding, ensure you can:

[ ] Identify at least 3 current research directions in VSA
[ ] Explain challenges in learning-based VSA
[ ] Describe opportunities for VSA + LLMs
[ ] Understand neuromorphic hardware potential
[ ] Recognize open theoretical and applied problems
[ ] Formulate your own research question
[ ] Know how to contribute to VSAX

Final Quiz¶

Question 1: What is the main advantage of learned basis vectors over random ones?

a) Faster computation b) Semantic structure in the representation space c) Lower memory usage d) Easier to implement

Answer

**b) Semantic structure in the representation space** Learned basis vectors can capture inherent semantic relationships (e.g., "cat" and "dog" are similar) rather than being randomly orthogonal. This could improve generalization and reasoning while preserving VSA's compositional properties.

Question 2: Why is neuromorphic hardware promising for VSA?

a) VSA operations are naturally parallel and efficient in analog compute b) Neuromorphic chips are cheaper c) VSA doesn't work on traditional CPUs d) It's just a trend

Answer

**a) VSA operations are naturally parallel and efficient in analog compute** Binding and bundling are embarrassingly parallel element-wise operations. In-memory analog computing can perform these operations extremely fast and energy-efficiently by avoiding data movement and leveraging physical properties (current summation, resistance multiplication).

Question 3: What is continual learning's main challenge that VSA might address?

a) Training speed b) Catastrophic forgetting c) Model size d) Data collection

Answer

**b) Catastrophic forgetting** Traditional neural networks overwrite old knowledge when learning new tasks. VSA's bundling operation is additive and non-destructive - new facts can be added without erasing old ones, potentially enabling perfect continual learning.

Key Takeaways¶

✓ VSA research is vibrant and growing - opportunities across theory, algorithms, and applications ✓ Learning-based VSA - integrating gradient learning with symbolic operations ✓ VSA + LLMs - combining neural perception with symbolic reasoning ✓ Neuromorphic hardware - orders of magnitude efficiency gains possible ✓ Open problems abound - dimensionality, capacity, interpretability, continual learning ✓ You can contribute! - VSAX welcomes encoders, optimizations, and applications

Course Complete!¶

Congratulations! You've completed the VSAX course covering:

✅ Module 1: Foundational concepts (high dimensions, binding/bundling, three models)
✅ Module 2: Core operations (FHRR, MAP, Binary, similarity, model selection)
✅ Module 3: Encoders & applications (scalars, sequences, images, knowledge graphs, analogies)
✅ Module 4: Advanced techniques (operators, SSP, hierarchical, multi-modal)
✅ Module 5: Research & extensions (VFA, custom encoders, frontiers)

You are now equipped to: - Build VSA-powered applications - Design custom encoders for your domain - Contribute to VSA research - Push the boundaries of hyperdimensional computing

Where to Go From Here¶

Immediate Next Steps¶

Build something! Apply VSAX to your research or project
Share your work - Publish code, write blog posts, present at conferences
Join the community - GitHub Discussions, contribute to VSAX
Read papers - Dive deeper into topics that interest you

Advanced Topics (Beyond This Course)¶

Quantum VSA: Using quantum superposition for hypervectors
Biological VSA: Modeling neural circuits with VSA
VSA for Causality: Encoding causal relationships
Federated VSA: Distributed learning with privacy

Research Opportunities¶

Pick a research direction from this lesson and: 1. Read 3-5 key papers 2. Implement a proof-of-concept 3. Run experiments and analyze results 4. Write it up and share with the community!

Final Words¶

Vector Symbolic Architectures represent a paradigm shift in how we think about computation:

Not neural networks (though compatible with them)
Not classical symbolic AI (though shares symbolic properties)
A unique fusion of continuous + discrete, distributed + compositional, learned + structured

The field is young, the problems are hard, and the opportunities are immense.

Welcome to the VSA research community!

We can't wait to see what you build.

Acknowledgments¶

This course was built on decades of research by pioneers: - Tony Plate (Holographic Reduced Representations) - Pentti Kanerva (Hyperdimensional Computing) - Ross Gayler (Vector Symbolic Architectures) - Many others who advanced the field

Thank you for learning with us.

Now go forth and compute in high dimensions!

References¶

Key Surveys: - Kleyko, D., et al. (2022). "Vector Symbolic Architectures as a Computing Framework for Nanoscale Hardware." Proceedings of the IEEE, 109(8), 1366-1397. - Schlegel, K., Neubert, P., & Protzel, P. (2022). "A Comparison of Vector Symbolic Architectures." Artificial Intelligence Review, 55, 4523-4555.

Foundational Works: - Plate, T. A. (1995). "Holographic Reduced Representations." IEEE Transactions on Neural Networks, 6(3), 623-641. - Kanerva, P. (2009). "Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors." Cognitive Computation, 1, 139-159.

Recent Applications: - Imani, M., et al. (2019). "A Framework for Collaborative Learning in Secure High-Dimensional Space." - Neubert, P., Schubert, S., & Protzel, P. (2019). "Learning Vector Symbolic Architectures for Reactive Robot Behaviours." - Mitrokhin, A., et al. (2023). "Learning sensorimotor representations with spiking HDC."

Neuromorphic Hardware: - Karunaratne, G., et al. (2020). "In-memory hyperdimensional computing." Nature Electronics, 3, 327-337. - Poduval, P., et al. (2021). "HDnn: Hyperdimensional Inference with Spiking Neural Networks."