Lesson 3.3: Application - Image Classification¶
Duration: ~60 minutes (30 min theory + 30 min tutorial)
Learning Objectives:
- Understand how to encode images as hypervectors
- Learn the prototype-based classification approach
- Apply bundling to create class prototypes
- Use similarity search for classification
- Complete a full MNIST classification project
- Compare FHRR, MAP, and Binary models on real data
Introduction¶
You've learned how to encode structured data (scalars, sequences, dictionaries). Now let's apply these concepts to real-world image classification!
In this lesson, we'll classify handwritten digits (MNIST) using VSA. This demonstrates how VSA's compositional operations naturally handle high-dimensional data like images.
What makes VSA good for classification? - ✅ Interpretable: Class prototypes are explicit hypervectors - ✅ Few-shot learning: Learn from few examples per class - ✅ Compositional: Combine pixel features naturally - ✅ Efficient: GPU-accelerated, no gradient descent needed - ✅ Robust: Gracefully handles noise and occlusion
The VSA Classification Pipeline¶
VSA classification follows a simple 3-step process:
1. ENCODE
Images → Hypervectors
[Use bundling to combine pixel features]
2. LEARN
Create class prototypes by bundling training examples
Prototype[i] = Bundle(all images of class i)
3. CLASSIFY
Find prototype with highest similarity to test image
Prediction = argmax(similarity(test, prototype))
Key insight: No gradient descent, no backpropagation—just encoding and similarity!
Step 1: Encoding Images¶
The Challenge: High-Dimensional Pixel Data¶
An 8×8 grayscale image has 64 pixels, each with intensity in [0, 1].
How do we encode this as a hypervector?
Approach: Spatial Bundling¶
Each pixel position gets a random basis vector, and we bundle pixel values weighted by intensity:
Example:
from vsax import create_fhrr_model, VSAMemory
model = create_fhrr_model(dim=1024)
memory = VSAMemory(model)
# Create basis vectors for each pixel position
for i in range(64): # 8x8 = 64 pixels
memory.add(f"pixel_{i}")
# Encode image: bundle weighted by intensity
def encode_image(image, model, memory):
"""
Encode 64-pixel image as hypervector.
Args:
image: 1D array of 64 pixel intensities [0, 1]
model: VSA model
memory: VSA memory with pixel basis vectors
Returns:
Encoded hypervector
"""
encoded = None
for i, intensity in enumerate(image):
pixel_vec = memory[f"pixel_{i}"].vec
# Weight pixel vector by intensity
weighted = intensity * pixel_vec
# Bundle (accumulate)
if encoded is None:
encoded = weighted
else:
encoded = encoded + weighted
# Normalize
import jax.numpy as jnp
encoded = encoded / jnp.linalg.norm(encoded)
return encoded
Why it works: - Pixels with high intensity contribute more to the final hypervector - Pixel position information is preserved (different basis for each position) - Similar images → similar hypervectors
Efficient Batch Encoding with vmap¶
For real applications, use JAX's vmap to encode batches efficiently:
import jax.numpy as jnp
from jax import vmap
def encode_batch(images, basis_vectors):
"""
Encode batch of images efficiently.
Args:
images: (batch_size, 64) array of images
basis_vectors: (64, dim) array of pixel basis vectors
Returns:
(batch_size, dim) encoded hypervectors
"""
# Matrix multiplication: (batch, 64) @ (64, dim) → (batch, dim)
encoded = images @ basis_vectors
# Normalize each row
norms = jnp.linalg.norm(encoded, axis=1, keepdims=True)
encoded = encoded / norms
return encoded
# Usage
basis = jnp.stack([memory[f"pixel_{i}"].vec for i in range(64)])
encoded_batch = encode_batch(X_train, basis)
Performance: ~100× faster than looping!
Step 2: Creating Class Prototypes¶
Once we have encoded images, we create prototypes for each class by bundling all training examples:
Code:
import jax.numpy as jnp
def create_prototypes(encoded_images, labels, num_classes=10):
"""
Create class prototypes by bundling training examples.
Args:
encoded_images: (num_samples, dim) encoded training images
labels: (num_samples,) class labels
num_classes: Number of classes
Returns:
(num_classes, dim) prototype vectors
"""
dim = encoded_images.shape[1]
prototypes = jnp.zeros((num_classes, dim))
for class_id in range(num_classes):
# Get all images of this class
class_mask = labels == class_id
class_images = encoded_images[class_mask]
# Bundle (sum and normalize)
prototype = jnp.sum(class_images, axis=0)
prototype = prototype / jnp.linalg.norm(prototype)
prototypes = prototypes.at[class_id].set(prototype)
return prototypes
# Create prototypes
prototypes = create_prototypes(encoded_train, y_train, num_classes=10)
print(f"Prototype shape: {prototypes.shape}") # (10, 1024)
Interpretation: - Each prototype is the average encoded representation of a class - Captures common features across all training examples - New examples are classified by similarity to prototypes
Step 3: Classification by Similarity¶
To classify a test image:
- Encode test image as hypervector
- Compute similarity to all class prototypes
- Predict class with highest similarity
from vsax.similarity import cosine_similarity
from vsax.utils import vmap_similarity
def classify(test_image_encoded, prototypes):
"""
Classify test image by finding nearest prototype.
Args:
test_image_encoded: (dim,) encoded test image
prototypes: (num_classes, dim) class prototypes
Returns:
Predicted class (0-9)
"""
# Compute similarities to all prototypes
similarities = vmap_similarity(None, test_image_encoded, prototypes)
# Return class with highest similarity
predicted_class = jnp.argmax(similarities)
return int(predicted_class)
# Classify single test image
prediction = classify(encoded_test[0], prototypes)
print(f"Predicted: {prediction}, True: {y_test[0]}")
Batch Classification¶
For efficient classification of many test images:
def classify_batch(test_images_encoded, prototypes):
"""
Classify batch of test images.
Args:
test_images_encoded: (num_test, dim) encoded test images
prototypes: (num_classes, dim) class prototypes
Returns:
(num_test,) predicted classes
"""
# Compute similarity matrix: (num_test, dim) @ (dim, num_classes)
# → (num_test, num_classes)
similarities = test_images_encoded @ prototypes.T
# Argmax over classes
predictions = jnp.argmax(similarities, axis=1)
return predictions
# Classify all test images
predictions = classify_batch(encoded_test, prototypes)
accuracy = jnp.mean(predictions == y_test)
print(f"Test accuracy: {accuracy:.4f}")
Why This Approach Works¶
1. Distributed Representations¶
Each pixel contributes to the overall hypervector representation. Similar images have similar patterns of pixel activations → similar hypervectors.
2. Noise Robustness¶
Graceful degradation: Even if some pixels are corrupted, the overall representation remains similar.
# Original image
clean_image = X_test[0]
# Add noise
noisy_image = clean_image + 0.1 * np.random.randn(64)
noisy_image = np.clip(noisy_image, 0, 1)
# Encode both
clean_encoded = encode_image(clean_image, model, memory)
noisy_encoded = encode_image(noisy_image, model, memory)
# Still similar!
sim = cosine_similarity(clean_encoded, noisy_encoded)
print(f"Similarity: {sim:.4f}") # ~0.95
3. Compositional Features¶
Bundling naturally combines pixel features without explicit feature engineering.
4. Few-Shot Learning¶
Prototypes can be learned from very few examples:
# Use only 5 examples per class
few_shot_mask = []
for class_id in range(10):
class_indices = np.where(y_train == class_id)[0][:5]
few_shot_mask.extend(class_indices)
few_shot_train = encoded_train[few_shot_mask]
few_shot_labels = y_train[few_shot_mask]
# Create prototypes from 50 total examples
prototypes_few = create_prototypes(few_shot_train, few_shot_labels)
# Still reasonable accuracy!
predictions = classify_batch(encoded_test, prototypes_few)
accuracy = np.mean(predictions == y_test)
print(f"Few-shot accuracy (5 per class): {accuracy:.4f}")
Comparing VSA Models: FHRR vs MAP vs Binary¶
Different VSA models offer different trade-offs for image classification:
| Model | Accuracy | Speed | Memory | Best For |
|---|---|---|---|---|
| FHRR | Highest (~95%) | Moderate | High (8 bytes/elem) | Maximum accuracy |
| MAP | High (~93%) | Fast | Medium (4 bytes/elem) | Balanced |
| Binary | Good (~88%) | Fastest | Low (1 bit/elem) | Edge devices, embedded |
Recommendation: Start with FHRR for best accuracy, switch to Binary for deployment on constrained hardware.
Hands-On: Complete MNIST Classification Tutorial¶
Now that you understand the foundations, complete the full tutorial to implement and experiment with VSA-based image classification.
📓 Tutorial 1: MNIST Digit Classification
What you'll do in the tutorial:
- Setup: Load MNIST digits dataset (8×8 images)
- Encoding: Implement spatial bundling for images
- Training: Create class prototypes by bundling examples
- Evaluation: Classify test images and measure accuracy
- Comparison: Benchmark FHRR, MAP, and Binary models
- Visualization: Plot confusion matrices and analyze errors
- Experimentation: Try different dimensions, few-shot learning, noise robustness
Time estimate: 30-45 minutes
Prerequisites:
- Completed Lessons 3.1 and 3.2 (encoders)
- Basic Python and NumPy knowledge
- scikit-learn installed (pip install scikit-learn)
Key Concepts from the Tutorial¶
1. Pixel Basis Encoding¶
Each pixel position gets its own random basis vector. Images are encoded as weighted bundles:
# Basis for 64 pixels
pixel_basis = [memory.add(f"pixel_{i}") for i in range(64)]
# Encode image
image_hv = sum(intensity[i] * pixel_basis[i] for i in range(64))
2. Prototype Learning¶
Class prototypes are learned by averaging (bundling) encoded training examples:
# Prototype for digit "3"
digit_3_examples = encoded_train[y_train == 3]
prototype_3 = np.mean(digit_3_examples, axis=0)
prototype_3 = prototype_3 / np.linalg.norm(prototype_3)
3. Nearest Prototype Classification¶
Classification is similarity search to prototypes:
similarities = [cosine_similarity(test_hv, proto) for proto in prototypes]
predicted_digit = np.argmax(similarities)
4. Confusion Matrix Analysis¶
Analyze which digits are confused:
from sklearn.metrics import confusion_matrix
import seaborn as sns
cm = confusion_matrix(y_test, predictions)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
Common confusions: 3 ↔ 8, 4 ↔ 9, 5 ↔ 6 (similar shapes)
Extensions and Experiments¶
After completing the tutorial, try these experiments:
1. Dimension Scaling¶
How does accuracy change with dimension?
dimensions = [256, 512, 1024, 2048, 4096]
accuracies = []
for dim in dimensions:
model = create_fhrr_model(dim=dim)
# ... encode, train, test
accuracies.append(test_accuracy)
plt.plot(dimensions, accuracies)
plt.xlabel('Dimension')
plt.ylabel('Accuracy')
Expected: Accuracy increases with dimension, plateaus around d=2048.
2. Few-Shot Learning Curve¶
How many examples per class are needed?
examples_per_class = [1, 3, 5, 10, 20, 50, 100]
accuracies = []
for n in examples_per_class:
# Sample n examples per class
# Create prototypes
# Test
accuracies.append(test_accuracy)
plt.plot(examples_per_class, accuracies)
plt.xlabel('Examples per Class')
plt.ylabel('Accuracy')
3. Noise Robustness¶
Add Gaussian noise to test images:
noise_levels = [0.0, 0.05, 0.1, 0.2, 0.5]
accuracies = []
for noise in noise_levels:
noisy_test = X_test + noise * np.random.randn(*X_test.shape)
noisy_test = np.clip(noisy_test, 0, 1)
# Encode and classify
accuracies.append(test_accuracy)
plt.plot(noise_levels, accuracies)
plt.xlabel('Noise Level (σ)')
plt.ylabel('Accuracy')
Expected: VSA should degrade gracefully (accuracy drops slowly).
4. Occlusion Robustness¶
Mask out random pixels:
def occlude_image(image, occlusion_fraction=0.3):
"""Randomly set occlusion_fraction of pixels to 0."""
mask = np.random.rand(64) > occlusion_fraction
return image * mask
# Test with occlusion
occluded_test = [occlude_image(img, 0.3) for img in X_test]
# Encode and classify
Comparison with Traditional ML¶
| Approach | Accuracy | Training Time | Interpretability | Few-Shot |
|---|---|---|---|---|
| VSA (FHRR) | ~95% | <1 sec | High (prototypes) | Excellent |
| SVM | ~97% | ~5 sec | Low (hyperplane) | Poor |
| Random Forest | ~96% | ~10 sec | Medium (trees) | Poor |
| Neural Network | ~98% | ~60 sec | Low (weights) | Poor |
VSA advantages: - ✅ Extremely fast training (no gradient descent) - ✅ Excellent few-shot learning - ✅ Interpretable prototypes - ✅ Robust to noise
VSA disadvantages: - ❌ Slightly lower peak accuracy than deep learning - ❌ Requires careful dimension selection
Real-World Applications¶
VSA-based classification extends beyond MNIST:
1. Medical Imaging - X-ray classification (pneumonia detection) - MRI scan analysis - Skin lesion classification - Advantage: Few-shot learning (limited medical data)
2. Manufacturing Quality Control - Defect detection in products - Surface inspection - Advantage: Fast inference on edge devices (Binary model)
3. Biometric Recognition - Face recognition - Fingerprint matching - Advantage: Template protection (hypervector representations)
4. Remote Sensing - Satellite image classification - Land cover mapping - Advantage: Handles noise and missing data
Self-Assessment¶
Before moving to the next lesson, ensure you can:
- [ ] Explain how images are encoded using spatial bundling
- [ ] Create class prototypes by bundling training examples
- [ ] Classify new images using similarity to prototypes
- [ ] Implement efficient batch encoding with vmap
- [ ] Complete the MNIST classification tutorial
- [ ] Compare FHRR, MAP, and Binary models on real data
- [ ] Analyze confusion matrices and error patterns
- [ ] Extend the approach to custom datasets
Quick Quiz¶
Q1: Why do we create separate basis vectors for each pixel position?
a) To reduce memory usage b) To preserve spatial information (pixel location matters) c) To make encoding faster d) Required by VSAX API
Answer
**b) To preserve spatial information** - Each pixel position needs its own basis vector so the encoding knows "this is pixel 0" vs "this is pixel 63". Without position-specific bases, we'd lose where each intensity value came from.Q2: How are class prototypes created in VSA classification?
a) Gradient descent optimization b) K-means clustering c) Bundling (averaging) all training examples of that class d) Random sampling
Answer
**c) Bundling all training examples** - Prototypes are created by summing (bundling) the encoded hypervectors of all training images for a class, then normalizing. This creates an "average" representation.Q3: Why is VSA good for few-shot learning?
a) Uses less memory than neural networks b) Prototypes can be learned from very few examples via bundling c) Doesn't require labeled data d) Runs faster on GPUs
Answer
**b) Prototypes from few examples** - Bundling allows us to create meaningful prototypes even with 1-5 examples per class. Each new example refines the prototype through averaging.Q4: Which VSA model should you use for embedded edge device deployment?
a) FHRR (highest accuracy) b) MAP (balanced) c) Binary (minimal memory, fast XOR operations) d) All models work equally well
Answer
**c) Binary** - Binary model uses only 1 bit per element (8× less memory than FHRR), and XOR operations are extremely fast on embedded hardware. Accuracy is slightly lower (~88% vs 95%) but acceptable for many applications.Key Takeaways¶
- VSA enables interpretable classification - Class prototypes are explicit hypervectors
- Spatial bundling encodes images - Weight pixel basis vectors by intensity
- Prototype learning is simple - Bundle training examples, no gradient descent
- Classification is similarity search - argmax(similarity to prototypes)
- Few-shot learning works naturally - Prototypes from as few as 1-5 examples
- Robust to noise - Distributed representations degrade gracefully
- Model trade-offs - FHRR (accuracy), MAP (balanced), Binary (efficiency)
Next: Lesson 3.4: Application - Knowledge Graph Reasoning
Apply DictEncoder and GraphEncoder to build queryable knowledge bases.
Previous: Lesson 3.2: Structured Data - Dictionaries and Sets