Architecture
Overview
Loomem is a Rust workspace with four crates:
| Crate | Role |
|---|---|
loomem-core |
Core library — storage, search, consolidation, graph, LLM, embeddings |
loomem-server |
HTTP + MCP server (Axum), auth, handlers, routing |
loomem-cli |
Command-line interface for direct interaction |
loomem-migrate |
Data migration utilities |
All business logic lives in loomem-core. The server is a thin HTTP layer that maps routes to core functions.
Data model
Chunk
The fundamental unit of memory. Every piece of knowledge — raw event, compressed summary, semantic group — is a Chunk.
chunk:L{level}:{uuid} → Chunk (JSON in RocksDB)
Core fields:
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique identifier |
content |
String | Memory text |
stream |
String | Namespace / isolation boundary (default: __user_default__) |
level |
0, 1 | Memory tier (raw, compressed) |
score |
f64 | Decay-adjusted relevance (0.0 – 1.0) |
timestamp |
u64 | Ingestion time (Unix seconds) |
importance |
f64 | Surprise-based boost (0.7 – 1.5) |
access_count |
u32 | Search hit counter (for adaptive decay) |
persistent |
bool | Exempt from decay |
is_latest |
bool | Head of supersede chain |
Consolidation fields:
| Field | Type | Description |
|---|---|---|
consolidated |
bool | L0: has been compressed to L1 |
source_ids |
Vec\<String> | L1: which L0 chunks were merged |
prompt_version |
u32 | Which consolidation prompt was used |
Contradiction / versioning fields:
| Field | Type | Description |
|---|---|---|
superseded_by |
Option\<String> | Points to newer version |
supersedes_id |
Option\<String> | What this chunk replaced |
root_memory_id |
Option\<String> | Root of the version chain |
version |
u32 | Semantic version counter |
Knowledge extraction metadata:
| Field | Type | Description |
|---|---|---|
extraction_meta.fact_type |
Enum | PreferenceOrDecision, ProjectState, Fact |
extraction_meta.subject |
String | Entity name (person, project) |
extraction_meta.event_date |
String | ISO date of the event |
extraction_meta.confidence |
f64 | LLM extraction confidence (0.0 – 1.0) |
Memory tiers
L0 (Raw) L1 (Compressed)
───────── ─────────────
Verbatim input LLM-summarized
Fast decay Medium decay
Promotion flow:
Ingest → L0 (raw event)
│
▼ [Consolidation worker, every 5 min]
L1 (compressed observations)
A separate clustering worker periodically groups L1 chunks by embedding similarity; the clusters feed the associator (serendipity engine), not a separate storage tier.
Entity graph
Entities and their relationships form a knowledge graph stored in RocksDB:
EntityNode {
canonical_name: "Cursor"
entity_type: "TECHNOLOGY"
aliases: ["cursor", "Cursor IDE"]
chunk_ids: ["abc-123", "def-456"] // evidence
}
Edge {
source: "Alice"
target: "Cursor"
relation_type: "uses"
chunk_ids: ["abc-123"] // evidence
}
Per-stream isolation: Each entity and edge is scoped to a stream_id. The same entity name in two streams gets separate entity nodes. Name/alias indexes are prefixed by stream.
RocksDB key scheme for graph:
graph:entity:{id} → EntityNode (JSON, includes stream_id)
graph:s:{stream_id}:name:{lower()} → entity_id
graph:s:{stream_id}:alias:{lower()} → entity_id
graph:adj:{src_id}:{edge_id} → target_id
graph:radj:{tgt_id}:{edge_id} → source_id
graph:chunk:{chunk_id} → [entity_ids] (reverse index)
Storage layer
RocksDB
Primary persistent store. Holds chunks, embeddings, graph, users, cost tracking.
Column families:
| Family | Contents |
|---|---|
| default | Chunks (chunk:L{n}:{id}), graph |
| embeddings | Vectors (emb:{id} → f32 array) |
| cost | Daily cost counters (cost:{date}) |
| keys | Wrapped per-stream data-encryption keys (at-rest encryption) |
Configuration: - Compression: LZ4 - Write buffer: 64 MB x 3 buffers - Max open files: 1000
Tantivy
Full-text search index. Mirrors chunk content with additional indexed fields.
Schema:
| Field | Type | Boost | Purpose |
|---|---|---|---|
id |
String | — | Chunk reference |
content |
Text | 1.0 | Primary search target |
entities |
Text | 0.2 | Entity mentions (comma-separated) |
relations |
Text | 0.2 | Relation triples |
stream |
String | — | Filtering |
timestamp |
i64 | — | Date filtering |
event_date |
i64 | — | Temporal queries |
level |
i64 | — | Tier filtering |
Polish stemming enabled for content field.
Vector store
Embeddings stored in RocksDB's embeddings column family.
Providers:
- Local (default) — multilingual-e5-small (384 dimensions) via the tract ONNX runtime (pure Rust, no native dependencies, runs offline)
- OpenAI — text-embedding-3-small (1536 dimensions), opt-in via embedding_provider = "openai"
Embedding queue: batch processing (50 items, 5s flush interval) to amortize API latency.
Intent log (WAL)
Write-ahead log ensures cross-store consistency between RocksDB and Tantivy.
1. Append: PENDING(op_type, chunk_id)
2. Write to RocksDB
3. Write to Tantivy
4. Append: COMMITTED(op_type, chunk_id)
On crash recovery: - PENDING without COMMITTED → rollback partial writes - COMMITTED → verify both stores have the data
Ingest pipeline
Content is sanitized before any storage:
Input content
│
▼ [sanitizer.rs]
1. HTML tag stripping + entity decoding
2. Instruction injection detection (18 patterns — logs warning, does not block)
│
▼ [pii_filter.rs]
3. PII redaction (phones, emails, PESEL, blocklist words)
│
▼ [persist_chunk]
4. RocksDB store (sanitized content)
5. Entity extraction + graph population (stream-scoped)
6. Tantivy indexing
7. Embedding queue
8. Contradiction detection
The sanitizer detects but does not block injection attempts — content is stored after stripping. PII redaction replaces sensitive data with [PHONE], [EMAIL], [ID], [REDACTED] tokens before storage.
Search pipeline
A query flows through multiple stages:
1. Query classification
"what do you know about me?" → Profile
"how many projects do I run?" → Aggregation (top_k boosted to 30)
"when did I change my IDE?" → Temporal (date filtering)
"why did I choose Rust?" → Complex (top_k = 20)
"my dog's name" → Simple (top_k = 3, BM25 only)
2. Date filter extraction
Parses relative dates from query text:
- "last week" → date_from: now - 7d
- "in March" → date_from: 2026-03-01, date_to: 2026-03-31
- Explicit date_from/date_to params override
3. Parallel search
BM25 and vector search run concurrently:
BM25 (Tantivy):
QueryParser(content^1.0, entities^0.2, relations^0.2)
+ stream filter
+ date range filter
→ ranked results
Vector (cosine similarity):
query_embedding = embed(query_text)
for each stored embedding:
score = cosine(query_embedding, chunk_embedding)
→ top-K by similarity
4. Fusion
normalized_bm25 = bm25_score / max_bm25
normalized_vector = vector_score / max_vector
fusion_score = 0.6 * normalized_vector + 0.4 * normalized_bm25
5. Time decay
age_days = (now - chunk.timestamp) / 86400
lambda = { L0: search.decay.l0_lambda, L1: search.decay.l1_lambda }
decay = e^(-lambda * age_days)
score = fusion_score * decay
6. Graph enhancement
For top results, find related entities and add their connected chunks:
result "Cursor" → entity "Cursor" → edges → related entities
→ neighbor chunks added with score * boost_factor (0.3)
7. Deduplication
Collapse near-identical results (high cosine similarity between result contents).
8. Optional reranking
If enabled, top-20 candidates are re-scored by: - ONNX cross-encoder (local, ~97ms/pair) — or - Async LLM reranking with speculative cache
9. Implicit boost
Non-dry-run searches increment access_count and boost importance of returned chunks (capped at 1.5, 1-hour cooldown).
10. Response
{
"results": [
{
"chunk_id": "abc-123",
"content": "User prefers Cursor over VSCode",
"score_final": 0.87,
"trace_info": {
"level": "L1",
"source": "consolidation",
"is_latest": true,
"access_count": 5
}
}
],
"trace_metadata": {
"total_results_before_topk": 42,
"search_latency_us": 1200
}
}
Background workers
The scheduler orchestrates background jobs. All workers can be paused/resumed at runtime via POST /admin/workers/pause|resume (useful for eval runs). Pause state is an AtomicBool shared between AppState and Scheduler — does not survive restart.
Consolidation (L0 → L1)
| Setting | Default |
|---|---|
| Interval | 5 minutes |
| Batch size | 200 chunks |
| Min age | 60 seconds |
| Min chunks | 3 per stream |
| Style | observation (granular facts) |
| Similarity threshold | 0.3 (cosine, for topic grouping) |
Process:
1. Scan unconsolidated L0 chunks
2. Group by stream (user isolation)
3. Sub-group by topic similarity — greedy clustering on embeddings (cosine threshold). Prevents unrelated facts from merging into one L1 chunk.
4. PII redaction (per sub-group)
5. LLM compression (gpt-4.1-mini) — one call per sub-group
6. Create L1 chunk with source_ids linking back to L0
7. Index in Tantivy + embedding queue
8. Mark L0 as consolidated: true
Chunks without embeddings fall back to a single group (pre-clustering behavior). Single-topic streams produce one sub-group — zero regression risk.
Decay
| Setting | Default |
|---|---|
| Interval | 1 hour |
| L0 factor | 0.990 per hour |
| L1 factor | 0.995 per hour |
| Dormant threshold | 0.01 |
| Adaptive | enabled (ACT-R) |
Adaptive decay: chunks with high access_count decay slower (adaptive_dampening / adaptive_cap in [worker.decay_worker]).
Clustering
| Setting | Default |
|---|---|
| Interval | 6 hours |
| Algorithm | k-means on L1 embeddings |
| Max iterations | 1000 |
Cluster output feeds the associator (below); there is no separate storage tier for clusters.
Entity extraction queue
| Setting | Default |
|---|---|
| Flush interval | 3 seconds |
| Queue capacity | 200 |
| Confidence threshold | 0.7 |
| Model | gpt-4.1-mini |
Async LLM NER runs in background for entities not in the dictionary.
Embedding queue
| Setting | Default |
|---|---|
| Batch size | 50 |
| Flush interval | 5 seconds |
Associator (ECA — serendipity engine)
| Setting | Default |
|---|---|
| Interval | 6 hours (with clustering) |
| Min serendipity | 0.1 |
| Max associations | 3 per query |
| Mechanisms | graph walk, temporal, adjacent |
Components: - Clustering — k-means on chunk embeddings, per-stream - Graph walk — random walk with weak-tie preference (fewer shared chunks = more novel) - Temporal — find chunks near the same time period - Serendipity scoring — relevance × (1 - obviousness) × cluster distance
Hard-purge (retention worker)
| Setting | Default |
|---|---|
| Interval | 24 hours (retention.hard_purge_interval_secs) |
| Retention window | 30 days (retention.soft_delete_days) |
Scans for soft-deleted chunks past their recovery window. Purge pipeline: graph references → Tantivy → RocksDB hard delete (chunk + embedding + entities + relations).
Dream (auto-consolidation)
| Setting | Default |
|---|---|
| Auto-trigger | 30 min idle |
| Batch size | 50 chunks |
| Min group size | 2 |
| Cost cap | $0.10 / run |
Authentication
Loomem is single-user: one API key controls access to the whole instance.
- The key is read from the env var named by
server.auth_token_envinconfig.toml(defaultLOOMEM_AUTH_TOKEN). - All requests require
Authorization: Bearer <key>;/healthremains open. - If no key is configured, the server runs in local passthrough mode — every request is accepted with admin privileges. Use only for local development.
Data written without an explicit stream lands in the default stream __user_default__. Additional streams (from [streams] / [namespaces] in config) partition data within the same instance — they are an organizational boundary, not separate identities.
OAuth 2.0
For MCP Remote Connector (claude.ai): - Dynamic Client Registration (RFC 7591) - Authorization Code flow with PKCE - User enters API key during authorization - Access token = API key (no extra token layer)
MCP integration
Loomem implements the Model Context Protocol (MCP) as a JSON-RPC 2.0 endpoint at POST /mcp.
Request flow:
Claude (MCP client)
│
▼
POST /mcp (JSON-RPC)
│
▼
mcp::handler → parse request → extract tool + args
│
▼
mcp::dispatcher → match tool name → call internal handler
│
▼
loomem-core / handlers → execute → return ToolResult
│
▼
JSON-RPC response → Claude
Session management: OAuth tokens map to sessions, sessions map to stream_id for data isolation.
Crash recovery
- Intent log replay — on startup, scan WAL for uncommitted operations
- Partial write detection — check RocksDB and Tantivy for consistency
- Orphan cleanup — remove chunks marked
in_progressfrom failed consolidation - Tantivy rebuild — if schema version mismatch, rebuild index from RocksDB source of truth
Cost tracking
Every LLM call (consolidation, extraction, embedding) is tracked:
[cost]
daily_cap_usd = 15.00 # Hard stop
alert_threshold_usd = 10.00 # Warning
anomaly_multiplier = 3.0 # 3x typical = anomaly alert
Costs persisted in RocksDB column family. Workers check budget before each LLM call.