Memory System (RAG) - Technical Documentation

Overview

Hephaestus implements a bidirectional memory system that enables agents to both write and read shared knowledge. This creates a collective intelligence where all agents benefit from discoveries, solutions, and learnings across the entire system.

Architecture

Hybrid Memory Approach

Two-Tier System

Tier 1: Pre-loaded Context (80% of needs)

Happens at agent spawn time
RAG retrieves top 20 most relevant memories
Based on task description similarity
Embedded in agent's initial system prompt
Fast, no API calls during execution
Covers most common scenarios

Tier 2: Dynamic Search (20% of needs)

Available during agent execution
Agent calls qdrant-find tool via MCP
Semantic search on demand
For specific errors, edge cases, deep dives
Real-time knowledge retrieval

Storage Architecture

Dual Storage System

SQLite Schema

Table: memories

CREATE TABLE memories (
    id TEXT PRIMARY KEY,              -- UUID
    created_at TIMESTAMP,
    agent_id TEXT,                    -- Creating agent
    content TEXT,                     -- Memory content
    memory_type TEXT,                 -- error_fix, discovery, etc.
    embedding_id TEXT,                -- Qdrant vector ID
    related_task_id TEXT,             -- Associated task
    tags JSON,                        -- Searchable tags
    related_files JSON,               -- File paths
    extra_data JSON,                  -- Additional metadata
    FOREIGN KEY (agent_id) REFERENCES agents(id),
    FOREIGN KEY (related_task_id) REFERENCES tasks(id)
);

Qdrant Collections

Primary Collection: hephaestus_agent_memories

{
    "name": "hephaestus_agent_memories",
    "vectors": {
        "size": 3072,  # OpenAI text-embedding-3-large
        "distance": "Cosine"
    },
    "payload_schema": {
        "content": "text",
        "memory_type": "keyword",
        "agent_id": "keyword",
        "task_id": "keyword",
        "timestamp": "datetime",
        "tags": "keyword[]",
        "related_files": "text[]"
    }
}

Other Collections:

hephaestus_static_docs - Documentation files
hephaestus_task_completions - Historical task data
hephaestus_error_solutions - Known error fixes
hephaestus_domain_knowledge - CVEs, CWEs, standards
hephaestus_project_context - Current project state

Memory Types

Taxonomy

Type	Description	When to Use	Example
error_fix	Solutions to errors	After fixing a bug	"Fixed PostgreSQL timeout by increasing pool_size to 20"
discovery	Important findings	New insights about code	"Authentication uses JWT with 24h expiry"
decision	Key decisions & rationale	After making design choice	"Chose Redis over Memcached for pub/sub support"
learning	Lessons learned	After completing task	"Always validate input before SQL queries"
warning	Gotchas to avoid	Encountered edge case	"Don't use os.fork() with SQLite connections"
codebase_knowledge	Code structure insights	Understanding architecture	"API routes are defined in src/api/routes/"

Type Usage Guidelines

error_fix:

await save_memory(
    memory_type="error_fix",
    content="ModuleNotFoundError: Fixed by adding src/ to PYTHONPATH in pytest.ini",
    tags=["pytest", "imports", "python"],
    related_files=["pytest.ini", "tests/conftest.py"]
)

discovery:

await save_memory(
    memory_type="discovery",
    content="Database migrations run automatically on server start via Alembic",
    tags=["database", "migrations", "alembic"],
    related_files=["src/db/migrations/", "alembic.ini"]
)

decision:

await save_memory(
    memory_type="decision",
    content="Using FastAPI over Flask for async support and auto-generated OpenAPI docs",
    tags=["framework", "fastapi", "architecture"]
)

Embedding Model

OpenAI text-embedding-3-large

Specifications:

Dimensions: 3072
Context window: 8,191 tokens
Cost: $0.00013 per 1K tokens
Quality: State-of-the-art semantic understanding

Why this model:

High dimensionality - Better semantic capture
Superior quality - Outperforms smaller models
Consistent usage - Same model across all memories
Future-proof - Latest OpenAI technology

Alternative considered:

sentence-transformers/all-MiniLM-L6-v2 (384-dim)
- Rejected: Too low dimensionality
- Rejected: Quality issues with technical content

Deduplication System

Similarity Threshold: 0.95

Rationale:

Prevents redundant knowledge storage
Reduces vector storage costs
Maintains uniqueness of insights
Threshold chosen empirically (>95% = nearly identical)

Edge cases:

Similar but distinct memories (score 0.85-0.94) → Stored separately
Exact duplicates (score >0.99) → Flagged in SQLite
Different perspectives on same topic → Welcome diversity

RAG Retrieval Flow

At Task Creation

During Agent Execution

MCP Integration

Hephaestus MCP Tools

1. save_memory

{
    "tool": "mcp__hephaestus__save_memory",
    "parameters": {
        "content": str,           # Memory content (required)
        "agent_id": str,          # Your agent ID (required)
        "memory_type": str,       # Type from taxonomy (required)
        "tags": List[str],        # Optional tags
        "related_files": List[str],  # Optional file paths
        "extra_data": Dict        # Optional metadata
    }
}

Example:

save_memory(
    content="Fixed CORS by adding allow_origins=['*'] to FastAPI middleware",
    agent_id="agent-123",
    memory_type="error_fix",
    tags=["cors", "fastapi", "middleware"],
    related_files=["src/main.py"]
)

Qdrant MCP Tools (Custom)

2. qdrant_find

{
    "tool": "qdrant_find",
    "parameters": {
        "query": str,    # Natural language search query
        "limit": int     # Max results (default: 5)
    }
}

Example:

qdrant_find(
    query="How to handle PostgreSQL connection pooling",
    limit=3
)
# Returns:
# [1] Score: 0.847 | Type: discovery
#     PostgreSQL connection pool configured with max_connections=20...
# [2] Score: 0.782 | Type: error_fix
#     Fixed connection timeout by setting pool_timeout=30...
# [3] Score: 0.735 | Type: decision
#     Chose asyncpg over psycopg2 for async support...

Custom Qdrant MCP Server

Why custom:

Default mcp-server-qdrant uses FastEmbed (max 1024-dim)
Hephaestus uses OpenAI embeddings (3072-dim)
Dimension mismatch → errors

Solution:

Created qdrant_mcp_openai.py
Uses OpenAI embeddings directly
Matches existing Qdrant collections

Configuration:

claude mcp add -s user qdrant python /path/to/qdrant_mcp_openai.py \
  -e QDRANT_URL=http://localhost:6333 \
  -e COLLECTION_NAME=hephaestus_agent_memories \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -e EMBEDDING_MODEL=text-embedding-3-large

Agent Prompts

System Prompt Template

Agents receive this guidance:

═══ PRE-LOADED CONTEXT ═══
Top 10 relevant memories (use qdrant-find for more):
- Memory 1: [content preview]
- Memory 2: [content preview]
...

═══ AVAILABLE TOOLS ═══

Hephaestus MCP (task management):
• save_memory - Save discoveries for other agents

Qdrant MCP (memory search):
• qdrant-find - Search agent memories semantically
  Use when: encountering errors, needing implementation details
  Example: "qdrant-find 'PostgreSQL connection timeout solutions'"
  Note: Pre-loaded context covers most needs; search for specifics

═══ WORKFLOW ═══
1. Work on your task using pre-loaded context
2. Use qdrant-find if you need specific information
3. Save important discoveries via save_memory

When Agents Use qdrant-find

Common scenarios:

Encountering errors not in pre-loaded context

"qdrant-find 'ModuleNotFoundError when importing src modules'"

Need implementation details

"qdrant-find 'how authentication middleware is configured'"

Finding related work

"qdrant-find 'previous API rate limiting implementations'"

Exploring patterns

"qdrant-find 'database migration strategies used in this project'"

Performance Characteristics

Pre-loaded Context (Tier 1)

Metric	Value
Retrieval time	~2-3 seconds
API calls	1 embedding generation
Cost per task	~$0.00003
Coverage	80% of agent needs
Context size	Top 20 memories (~4KB)

Dynamic Search (Tier 2)

Metric	Value
Query time	~1-2 seconds
API calls	1 embedding per search
Cost per search	~$0.000003
Usage	20% of agent needs
Results	Top 5 by default

Storage

Metric	Value
Current memories	1,085+
Vector storage	~12MB (3072-dim × 1085)
SQLite size	~500KB
Growth rate	~50-100 memories/day

Best Practices

For Agents

✅ DO:

Use pre-loaded context first
Save memories after significant discoveries
Use specific queries for qdrant-find
Include relevant tags and file paths
Choose appropriate memory types

❌ DON'T:

Search for info already in pre-loaded context
Save trivial or obvious information
Use vague queries like "help me"
Duplicate existing memories
Omit context in memory content

For System Operators

Maintenance:

# Check collection health
curl http://localhost:6333/collections/hephaestus_agent_memories

# View recent memories
python scripts/view_memories.py --limit 10

# Clean stale memories (when starting new project)
python scripts/clean_qdrant.py

# Reinitialize collections
python scripts/init_qdrant.py

Monitoring:

Track memory growth rate
Monitor search quality (relevance scores)
Identify duplicate patterns
Audit memory type distribution

Troubleshooting

Common Issues

1. No search results

# Check Qdrant is running
curl http://localhost:6333/collections

# Verify collection exists
python -c "from qdrant_client import QdrantClient; \
  client = QdrantClient('http://localhost:6333'); \
  print(client.count('hephaestus_agent_memories'))"

# Check for data
curl http://localhost:6333/collections/hephaestus_agent_memories/points/count

2. Dimension mismatch errors

Error: Wrong input: Vector dimension error: expected dim: 3072, got 1536

Solution: Ensure using custom Qdrant MCP with OpenAI embeddings, not default FastEmbed.

3. High API costs

# Monitor embedding generation
grep "generate_embedding" logs/server.log | wc -l

# If too high:
# - Increase pre-loaded context (top 30 instead of 20)
# - Cache common queries
# - Batch embedding generation

4. Poor search quality

# Symptoms: Irrelevant results, low scores (<0.3)

# Solutions:
# - Improve query specificity
# - Add more diverse memories
# - Check for stale/outdated memories
# - Consider reindexing with better metadata

Advanced Topics

Multi-Collection Search

Future enhancement to search across all collections:

results = await vector_store.search_all_collections(
    query_vector=embedding,
    limit_per_collection=5,
    total_limit=20
)
# Returns merged results from all 6 collections

Memory Lifecycle

Semantic Search Internals

Cosine Similarity:

similarity = dot(query_vec, memory_vec) / (norm(query_vec) * norm(memory_vec))
# Range: [-1, 1]
# Typical results: 0.3-0.9 for relevant memories

Score interpretation:

0.8-1.0: Highly relevant, nearly exact match
0.6-0.8: Very relevant, good semantic match
0.4-0.6: Moderately relevant, related concept
0.2-0.4: Loosely relevant, tangential
Below 0.2: Not relevant, different topic

Future Enhancements

Planned Features

Memory Consolidation
- Merge similar memories periodically
- LLM-powered summarization
- Keep best versions
Memory Expiration
- Auto-archive old memories
- Configurable TTL per type
- Reactivation on demand
Smart Tagging
- Auto-generate tags from content
- Hierarchical tag structure
- Tag-based filtering
Quality Scoring
- Track memory usefulness
- Upvote/downvote by agents
- Promote high-quality memories
Cross-Project Sharing
- Export/import memory sets
- Public memory marketplace
- Template memory packs

References

Last Updated: 2025-09-30 Version: 1.0

Overview​

Architecture​

Hybrid Memory Approach​

Two-Tier System​

Storage Architecture​

Dual Storage System​

SQLite Schema​

Qdrant Collections​

Memory Types​

Taxonomy​

Type Usage Guidelines​

Embedding Model​

OpenAI text-embedding-3-large​

Deduplication System​

Similarity Threshold: 0.95​

RAG Retrieval Flow​

At Task Creation​

During Agent Execution​

MCP Integration​

Hephaestus MCP Tools​

Qdrant MCP Tools (Custom)​

Custom Qdrant MCP Server​

Agent Prompts​

System Prompt Template​

When Agents Use qdrant-find​

Performance Characteristics​

Pre-loaded Context (Tier 1)​

Dynamic Search (Tier 2)​

Storage​

Best Practices​

For Agents​

For System Operators​

Troubleshooting​

Common Issues​

Advanced Topics​

Multi-Collection Search​

Memory Lifecycle​

Semantic Search Internals​

Future Enhancements​

Planned Features​

References​