Persistence: Saving and Loading Basis Vectors¶

VSAX provides simple JSON-based persistence for saving and loading basis vectors. This enables you to:

Preserve semantic spaces across sessions
Share vocabularies between projects
Version control your basis vectors
Reproduce experiments with exact same vectors

Quick Start¶

from vsax import create_fhrr_model, VSAMemory, save_basis, load_basis

# Create and populate memory
model = create_fhrr_model(dim=512)
memory = VSAMemory(model)
memory.add_many(["dog", "cat", "animal", "pet"])

# Save to JSON
save_basis(memory, "animals.json")

# Later: Load into new memory
new_memory = VSAMemory(model)
load_basis(new_memory, "animals.json")

# Vectors are preserved exactly
assert "dog" in new_memory

Saving Basis Vectors¶

Basic Usage¶

from pathlib import Path
from vsax import save_basis

# Save with Path object
save_basis(memory, Path("my_basis.json"))

# Or with string path
save_basis(memory, "my_basis.json")

What Gets Saved?¶

The JSON file contains:

Metadata: Dimension, representation type, vector count
Vectors: All named vectors in the memory

Example JSON structure for FHRR (complex) vectors:

{
  "metadata": {
    "dim": 512,
    "rep_type": "complex",
    "num_vectors": 3
  },
  "vectors": {
    "dog": {
      "real": [0.12, -0.34, ...],
      "imag": [0.56, 0.78, ...]
    },
    "cat": {
      "real": [-0.45, 0.23, ...],
      "imag": [0.11, -0.67, ...]
    }
  }
}

All Three Models Supported¶

FHRR (Complex Vectors): - Stored as separate real and imaginary parts - JSON keys: "real" and "imag"

MAP (Real Vectors): - Stored as simple float arrays - Direct list representation

Binary (Bipolar Vectors): - Stored as integer arrays (-1, +1 or 0, 1) - Compact representation

# Each model saves differently
fhrr_model = create_fhrr_model(dim=512)
map_model = create_map_model(dim=512)
binary_model = create_binary_model(dim=10000, bipolar=True)

memory_fhrr = VSAMemory(fhrr_model)
memory_map = VSAMemory(map_model)
memory_binary = VSAMemory(binary_model)

# All use same API
save_basis(memory_fhrr, "fhrr.json")
save_basis(memory_map, "map.json")
save_basis(memory_binary, "binary.json")

Loading Basis Vectors¶

Basic Usage¶

from vsax import load_basis

# Create empty memory with correct model
model = create_fhrr_model(dim=512)
memory = VSAMemory(model)

# Load from file
load_basis(memory, "my_basis.json")

# Memory is now populated
print(f"Loaded {len(memory._vectors)} vectors")

Requirements¶

Empty Memory: Memory must be empty before loading
Matching Dimension: File dimension must match memory's model dimension
Matching Type: File rep_type must match memory's model type

Error Handling¶

# Dimension mismatch
model_128 = create_fhrr_model(dim=128)
model_256 = create_fhrr_model(dim=256)

memory_128 = VSAMemory(model_128)
memory_128.add("test")
save_basis(memory_128, "test.json")

memory_256 = VSAMemory(model_256)
try:
    load_basis(memory_256, "test.json")  # ❌ Dimension mismatch!
except ValueError as e:
    print(f"Error: {e}")

# Representation type mismatch
fhrr_memory = VSAMemory(create_fhrr_model(dim=128))
fhrr_memory.add("test")
save_basis(fhrr_memory, "test.json")

map_memory = VSAMemory(create_map_model(dim=128))
try:
    load_basis(map_memory, "test.json")  # ❌ Type mismatch!
except ValueError as e:
    print(f"Error: {e}")

# Non-empty memory
memory = VSAMemory(create_fhrr_model(dim=128))
memory.add("existing")
try:
    load_basis(memory, "test.json")  # ❌ Memory not empty!
except ValueError as e:
    print(f"Error: {e}")

Common Use Cases¶

1. Persistent Semantic Spaces¶

Build a knowledge base once, reuse it across sessions:

# Session 1: Build semantic space
model = create_fhrr_model(dim=1024)
memory = VSAMemory(model)

# Add domain vocabulary
memory.add_many([
    "entity1", "entity2", "relation1", "relation2",
    "attribute1", "attribute2", ...
])

# Create complex structures
entity_with_attr = model.opset.bind(
    memory["entity1"].vec,
    memory["attribute1"].vec
)

# Save for later
save_basis(memory, "knowledge_base.json")

# Session 2: Load and use
memory_new = VSAMemory(model)
load_basis(memory_new, "knowledge_base.json")

# All symbols available immediately
entity = memory_new["entity1"]

Share exact basis vectors between projects or team members:

# Project A: Create shared vocabulary
model = create_map_model(dim=512)
memory = VSAMemory(model)
memory.add_many(["term1", "term2", "term3", ...])
save_basis(memory, "shared_vocab.json")

# Project B: Use same vocabulary
model_b = create_map_model(dim=512)  # Same dim!
memory_b = VSAMemory(model_b)
load_basis(memory_b, "shared_vocab.json")

# Projects now use identical basis

3. Reproducible Research¶

Version control your basis vectors for reproducible experiments:

# Save basis with experiment
git add experiment_basis.json
git commit -m "Add basis for experiment 1"

# Others can reproduce exact results
git clone repo
python experiment.py  # Loads experiment_basis.json

4. Incremental Development¶

Save progress and resume later:

# Day 1: Initial setup
memory = VSAMemory(create_fhrr_model(dim=512))
memory.add_many(["concept1", "concept2", ...])
save_basis(memory, "progress.json")

# Day 2: Resume and extend
memory = VSAMemory(create_fhrr_model(dim=512))
load_basis(memory, "progress.json")
memory.add_many(["concept3", "concept4", ...])  # Add more
save_basis(memory, "progress.json")  # Overwrite

Best Practices¶

File Organization¶

project/
├── basis/
│   ├── entities.json      # Entity vectors
│   ├── relations.json     # Relation vectors
│   └── attributes.json    # Attribute vectors
├── experiments/
│   ├── exp1_basis.json
│   └── exp2_basis.json
└── shared/
    └── common_vocab.json

Naming Conventions¶

# Descriptive filenames
save_basis(memory, "medical_terms_512d_fhrr.json")
save_basis(memory, "colors_256d_map.json")
save_basis(memory, "code_symbols_10k_binary.json")

Version Control¶

# Include dimension and date in filename
from datetime import datetime

date_str = datetime.now().strftime("%Y%m%d")
filename = f"basis_{model.dim}d_{date_str}.json"
save_basis(memory, filename)

Testing¶

import jax.numpy as jnp

# Always verify round-trip
save_basis(memory_original, "test.json")
load_basis(memory_loaded, "test.json")

for name in memory_original._vectors:
    vec1 = memory_original[name].vec
    vec2 = memory_loaded[name].vec
    assert jnp.allclose(vec1, vec2, atol=1e-6)

Performance Considerations¶

File Size¶

FHRR: 2× vector dimension (real + imag parts)
MAP: 1× vector dimension (real values)
Binary: 1× vector dimension (integers)

Approximate sizes for 100 vectors:

Model	Dim	File Size
FHRR	512	~500 KB
MAP	512	~250 KB
Binary	10,000	~2 MB

Load Time¶

Loading is fast (typically < 100ms for typical sizes):

import time

start = time.time()
load_basis(memory, "large_basis.json")
elapsed = time.time() - start
print(f"Loaded in {elapsed*1000:.1f}ms")

Large Vocabularies¶

For very large vocabularies (1000s of vectors):

# Consider splitting into multiple files
save_basis(entities_memory, "entities.json")
save_basis(relations_memory, "relations.json")
save_basis(attributes_memory, "attributes.json")

# Load only what you need
memory = VSAMemory(model)
load_basis(memory, "entities.json")  # Load just entities

Troubleshooting¶

File not found?

from pathlib import Path

path = Path("my_basis.json")
if not path.exists():
    print(f"File not found: {path.absolute()}")

Wrong dimension?

# Check file metadata first
import json
with open("basis.json") as f:
    data = json.load(f)
    print(f"File dimension: {data['metadata']['dim']}")
    print(f"File type: {data['metadata']['rep_type']}")

Corrupted JSON?

try:
    load_basis(memory, "basis.json")
except json.JSONDecodeError:
    print("JSON file is corrupted")

Persistence: Saving and Loading Basis Vectors¶

Quick Start¶

Saving Basis Vectors¶

Basic Usage¶

What Gets Saved?¶

All Three Models Supported¶

Loading Basis Vectors¶

Basic Usage¶

Requirements¶

Error Handling¶

Common Use Cases¶

1. Persistent Semantic Spaces¶

2. Sharing Vocabularies¶

3. Reproducible Research¶

4. Incremental Development¶

Best Practices¶

File Organization¶

Naming Conventions¶

Version Control¶

Testing¶

Performance Considerations¶

File Size¶

Load Time¶

Large Vocabularies¶

Troubleshooting¶

See Also¶