Memory System (RAG) - Technical Documentation
Overview
Hephaestus implements a bidirectional memory system that enables agents to both write and read shared knowledge. This creates a collective intelligence where all agents benefit from discoveries, solutions, and learnings across the entire system.
Architecture
Hybrid Memory Approach
Two-Tier System
Tier 1: Pre-loaded Context (80% of needs)
- Happens at agent spawn time
- RAG retrieves top 20 most relevant memories
- Based on task description similarity
- Embedded in agent's initial system prompt
- Fast, no API calls during execution
- Covers most common scenarios
Tier 2: Dynamic Search (20% of needs)
- Available during agent execution
- Agent calls
qdrant-findtool via MCP - Semantic search on demand
- For specific errors, edge cases, deep dives
- Real-time knowledge retrieval
Storage Architecture
Dual Storage System
SQLite Schema
Table: memories
CREATE TABLE memories (
id TEXT PRIMARY KEY, -- UUID
created_at TIMESTAMP,
agent_id TEXT, -- Creating agent
content TEXT, -- Memory content
memory_type TEXT, -- error_fix, discovery, etc.
embedding_id TEXT, -- Qdrant vector ID
related_task_id TEXT, -- Associated task
tags JSON, -- Searchable tags
related_files JSON, -- File paths
extra_data JSON, -- Additional metadata
FOREIGN KEY (agent_id) REFERENCES agents(id),
FOREIGN KEY (related_task_id) REFERENCES tasks(id)
);
Qdrant Collections
Primary Collection: hephaestus_agent_memories
{
"name": "hephaestus_agent_memories",
"vectors": {
"size": 3072, # OpenAI text-embedding-3-large
"distance": "Cosine"
},
"payload_schema": {
"content": "text",
"memory_type": "keyword",
"agent_id": "keyword",
"task_id": "keyword",
"timestamp": "datetime",
"tags": "keyword[]",
"related_files": "text[]"
}
}
Other Collections:
hephaestus_static_docs- Documentation fileshephaestus_task_completions- Historical task datahephaestus_error_solutions- Known error fixeshephaestus_domain_knowledge- CVEs, CWEs, standardshephaestus_project_context- Current project state
Memory Types
Taxonomy
| Type | Description | When to Use | Example |
|---|---|---|---|
| error_fix | Solutions to errors | After fixing a bug | "Fixed PostgreSQL timeout by increasing pool_size to 20" |
| discovery | Important findings | New insights about code | "Authentication uses JWT with 24h expiry" |
| decision | Key decisions & rationale | After making design choice | "Chose Redis over Memcached for pub/sub support" |
| learning | Lessons learned | After completing task | "Always validate input before SQL queries" |
| warning | Gotchas to avoid | Encountered edge case | "Don't use os.fork() with SQLite connections" |
| codebase_knowledge | Code structure insights | Understanding architecture | "API routes are defined in src/api/routes/" |
Type Usage Guidelines
error_fix:
await save_memory(
memory_type="error_fix",
content="ModuleNotFoundError: Fixed by adding src/ to PYTHONPATH in pytest.ini",
tags=["pytest", "imports", "python"],
related_files=["pytest.ini", "tests/conftest.py"]
)
discovery:
await save_memory(
memory_type="discovery",
content="Database migrations run automatically on server start via Alembic",
tags=["database", "migrations", "alembic"],
related_files=["src/db/migrations/", "alembic.ini"]
)
decision:
await save_memory(
memory_type="decision",
content="Using FastAPI over Flask for async support and auto-generated OpenAPI docs",
tags=["framework", "fastapi", "architecture"]
)
Embedding Model
OpenAI text-embedding-3-large
Specifications:
- Dimensions: 3072
- Context window: 8,191 tokens
- Cost: $0.00013 per 1K tokens
- Quality: State-of-the-art semantic understanding
Why this model:
- High dimensionality - Better semantic capture
- Superior quality - Outperforms smaller models
- Consistent usage - Same model across all memories
- Future-proof - Latest OpenAI technology
Alternative considered:
sentence-transformers/all-MiniLM-L6-v2(384-dim)- Rejected: Too low dimensionality
- Rejected: Quality issues with technical content
Deduplication System
Similarity Threshold: 0.95
Rationale:
- Prevents redundant knowledge storage
- Reduces vector storage costs
- Maintains uniqueness of insights
- Threshold chosen empirically (>95% = nearly identical)
Edge cases:
- Similar but distinct memories (score 0.85-0.94) → Stored separately
- Exact duplicates (score >0.99) → Flagged in SQLite
- Different perspectives on same topic → Welcome diversity
RAG Retrieval Flow
At Task Creation
During Agent Execution
MCP Integration
Hephaestus MCP Tools
1. save_memory
{
"tool": "mcp__hephaestus__save_memory",
"parameters": {
"content": str, # Memory content (required)
"agent_id": str, # Your agent ID (required)
"memory_type": str, # Type from taxonomy (required)
"tags": List[str], # Optional tags
"related_files": List[str], # Optional file paths
"extra_data": Dict # Optional metadata
}
}
Example:
save_memory(
content="Fixed CORS by adding allow_origins=['*'] to FastAPI middleware",
agent_id="agent-123",
memory_type="error_fix",
tags=["cors", "fastapi", "middleware"],
related_files=["src/main.py"]
)
Qdrant MCP Tools (Custom)
2. qdrant_find
{
"tool": "qdrant_find",
"parameters": {
"query": str, # Natural language search query
"limit": int # Max results (default: 5)
}
}
Example:
qdrant_find(
query="How to handle PostgreSQL connection pooling",
limit=3
)
# Returns:
# [1] Score: 0.847 | Type: discovery
# PostgreSQL connection pool configured with max_connections=20...
# [2] Score: 0.782 | Type: error_fix
# Fixed connection timeout by setting pool_timeout=30...
# [3] Score: 0.735 | Type: decision
# Chose asyncpg over psycopg2 for async support...
Custom Qdrant MCP Server
Why custom:
- Default
mcp-server-qdrantuses FastEmbed (max 1024-dim) - Hephaestus uses OpenAI embeddings (3072-dim)
- Dimension mismatch → errors
Solution:
- Created
qdrant_mcp_openai.py - Uses OpenAI embeddings directly
- Matches existing Qdrant collections
Configuration:
claude mcp add -s user qdrant python /path/to/qdrant_mcp_openai.py \
-e QDRANT_URL=http://localhost:6333 \
-e COLLECTION_NAME=hephaestus_agent_memories \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-e EMBEDDING_MODEL=text-embedding-3-large
Agent Prompts
System Prompt Template
Agents receive this guidance:
═══ PRE-LOADED CONTEXT ═══
Top 10 relevant memories (use qdrant-find for more):
- Memory 1: [content preview]
- Memory 2: [content preview]
...
═══ AVAILABLE TOOLS ═══
Hephaestus MCP (task management):
• save_memory - Save discoveries for other agents
Qdrant MCP (memory search):
• qdrant-find - Search agent memories semantically
Use when: encountering errors, needing implementation details
Example: "qdrant-find 'PostgreSQL connection timeout solutions'"
Note: Pre-loaded context covers most needs; search for specifics
═══ WORKFLOW ═══
1. Work on your task using pre-loaded context
2. Use qdrant-find if you need specific information
3. Save important discoveries via save_memory
When Agents Use qdrant-find
Common scenarios:
-
Encountering errors not in pre-loaded context
"qdrant-find 'ModuleNotFoundError when importing src modules'" -
Need implementation details
"qdrant-find 'how authentication middleware is configured'" -
Finding related work
"qdrant-find 'previous API rate limiting implementations'" -
Exploring patterns
"qdrant-find 'database migration strategies used in this project'"
Performance Characteristics
Pre-loaded Context (Tier 1)
| Metric | Value |
|---|---|
| Retrieval time | ~2-3 seconds |
| API calls | 1 embedding generation |
| Cost per task | ~$0.00003 |
| Coverage | 80% of agent needs |
| Context size | Top 20 memories (~4KB) |
Dynamic Search (Tier 2)
| Metric | Value |
|---|---|
| Query time | ~1-2 seconds |
| API calls | 1 embedding per search |
| Cost per search | ~$0.000003 |
| Usage | 20% of agent needs |
| Results | Top 5 by default |
Storage
| Metric | Value |
|---|---|
| Current memories | 1,085+ |
| Vector storage | ~12MB (3072-dim × 1085) |
| SQLite size | ~500KB |
| Growth rate | ~50-100 memories/day |
Best Practices
For Agents
✅ DO:
- Use pre-loaded context first
- Save memories after significant discoveries
- Use specific queries for qdrant-find
- Include relevant tags and file paths
- Choose appropriate memory types
❌ DON'T:
- Search for info already in pre-loaded context
- Save trivial or obvious information
- Use vague queries like "help me"
- Duplicate existing memories
- Omit context in memory content
For System Operators
Maintenance:
# Check collection health
curl http://localhost:6333/collections/hephaestus_agent_memories
# View recent memories
python scripts/view_memories.py --limit 10
# Clean stale memories (when starting new project)
python scripts/clean_qdrant.py
# Reinitialize collections
python scripts/init_qdrant.py
Monitoring:
- Track memory growth rate
- Monitor search quality (relevance scores)
- Identify duplicate patterns
- Audit memory type distribution
Troubleshooting
Common Issues
1. No search results
# Check Qdrant is running
curl http://localhost:6333/collections
# Verify collection exists
python -c "from qdrant_client import QdrantClient; \
client = QdrantClient('http://localhost:6333'); \
print(client.count('hephaestus_agent_memories'))"
# Check for data
curl http://localhost:6333/collections/hephaestus_agent_memories/points/count
2. Dimension mismatch errors
Error: Wrong input: Vector dimension error: expected dim: 3072, got 1536
Solution: Ensure using custom Qdrant MCP with OpenAI embeddings, not default FastEmbed.
3. High API costs
# Monitor embedding generation
grep "generate_embedding" logs/server.log | wc -l
# If too high:
# - Increase pre-loaded context (top 30 instead of 20)
# - Cache common queries
# - Batch embedding generation
4. Poor search quality
# Symptoms: Irrelevant results, low scores (<0.3)
# Solutions:
# - Improve query specificity
# - Add more diverse memories
# - Check for stale/outdated memories
# - Consider reindexing with better metadata
Advanced Topics
Multi-Collection Search
Future enhancement to search across all collections:
results = await vector_store.search_all_collections(
query_vector=embedding,
limit_per_collection=5,
total_limit=20
)
# Returns merged results from all 6 collections
Memory Lifecycle
Semantic Search Internals
Cosine Similarity:
similarity = dot(query_vec, memory_vec) / (norm(query_vec) * norm(memory_vec))
# Range: [-1, 1]
# Typical results: 0.3-0.9 for relevant memories
Score interpretation:
- 0.8-1.0: Highly relevant, nearly exact match
- 0.6-0.8: Very relevant, good semantic match
- 0.4-0.6: Moderately relevant, related concept
- 0.2-0.4: Loosely relevant, tangential
- Below 0.2: Not relevant, different topic
Future Enhancements
Planned Features
-
Memory Consolidation
- Merge similar memories periodically
- LLM-powered summarization
- Keep best versions
-
Memory Expiration
- Auto-archive old memories
- Configurable TTL per type
- Reactivation on demand
-
Smart Tagging
- Auto-generate tags from content
- Hierarchical tag structure
- Tag-based filtering
-
Quality Scoring
- Track memory usefulness
- Upvote/downvote by agents
- Promote high-quality memories
-
Cross-Project Sharing
- Export/import memory sets
- Public memory marketplace
- Template memory packs
References
Last Updated: 2025-09-30 Version: 1.0