# Architecture

## Overview

Loomem is a Rust workspace with four crates:

| Crate | Role |
|-------|------|
| `loomem-core` | Core library — storage, search, consolidation, graph, LLM, embeddings |
| `loomem-server` | HTTP + MCP server (Axum), auth, handlers, routing |
| `loomem-cli` | Command-line interface for direct interaction |
| `loomem-migrate` | Data migration utilities |

All business logic lives in `loomem-core`. The server is a thin HTTP layer that maps routes to core functions.

---

## Data model

### Chunk

The fundamental unit of memory. Every piece of knowledge — raw event, compressed summary, semantic group — is a Chunk.

```
chunk:L{level}:{uuid}  →  Chunk (JSON in RocksDB)
```

**Core fields:**

| Field | Type | Description |
|-------|------|-------------|
| `id` | UUID | Unique identifier |
| `content` | String | Memory text |
| `stream` | String | Namespace / isolation boundary (default: `__user_default__`) |
| `level` | 0, 1 | Memory tier (raw, compressed) |
| `score` | f64 | Decay-adjusted relevance (0.0 – 1.0) |
| `timestamp` | u64 | Ingestion time (Unix seconds) |
| `importance` | f64 | Surprise-based boost (0.7 – 1.5) |
| `access_count` | u32 | Search hit counter (for adaptive decay) |
| `persistent` | bool | Exempt from decay |
| `is_latest` | bool | Head of supersede chain |

**Consolidation fields:**

| Field | Type | Description |
|-------|------|-------------|
| `consolidated` | bool | L0: has been compressed to L1 |
| `source_ids` | Vec\<String\> | L1: which L0 chunks were merged |
| `prompt_version` | u32 | Which consolidation prompt was used |

**Contradiction / versioning fields:**

| Field | Type | Description |
|-------|------|-------------|
| `superseded_by` | Option\<String\> | Points to newer version |
| `supersedes_id` | Option\<String\> | What this chunk replaced |
| `root_memory_id` | Option\<String\> | Root of the version chain |
| `version` | u32 | Semantic version counter |

**Knowledge extraction metadata:**

| Field | Type | Description |
|-------|------|-------------|
| `extraction_meta.fact_type` | Enum | `PreferenceOrDecision`, `ProjectState`, `Fact` |
| `extraction_meta.subject` | String | Entity name (person, project) |
| `extraction_meta.event_date` | String | ISO date of the event |
| `extraction_meta.confidence` | f64 | LLM extraction confidence (0.0 – 1.0) |

### Memory tiers

```
L0 (Raw)          L1 (Compressed)
─────────         ─────────────
Verbatim input    LLM-summarized
Fast decay        Medium decay
```

**Promotion flow:**

```
Ingest → L0 (raw event)
            │
            ▼  [Consolidation worker, every 5 min]
         L1 (compressed observations)
```

A separate clustering worker periodically groups L1 chunks by embedding similarity; the clusters feed the associator (serendipity engine), not a separate storage tier.

### Entity graph

Entities and their relationships form a knowledge graph stored in RocksDB:

```
EntityNode {
    canonical_name: "Cursor"
    entity_type: "TECHNOLOGY"
    aliases: ["cursor", "Cursor IDE"]
    chunk_ids: ["abc-123", "def-456"]   // evidence
}

Edge {
    source: "Alice"
    target: "Cursor"
    relation_type: "uses"
    chunk_ids: ["abc-123"]              // evidence
}
```

**Per-stream isolation:** Each entity and edge is scoped to a `stream_id`. The same entity name in two streams gets separate entity nodes. Name/alias indexes are prefixed by stream.

**RocksDB key scheme for graph:**

```
graph:entity:{id}                    → EntityNode (JSON, includes stream_id)
graph:s:{stream_id}:name:{lower()}   → entity_id
graph:s:{stream_id}:alias:{lower()}  → entity_id
graph:adj:{src_id}:{edge_id}         → target_id
graph:radj:{tgt_id}:{edge_id}        → source_id
graph:chunk:{chunk_id}               → [entity_ids]  (reverse index)
```

---

## Storage layer

### RocksDB

Primary persistent store. Holds chunks, embeddings, graph, users, cost tracking.

**Column families:**

| Family | Contents |
|--------|----------|
| default | Chunks (`chunk:L{n}:{id}`), graph |
| embeddings | Vectors (`emb:{id}` → f32 array) |
| cost | Daily cost counters (`cost:{date}`) |
| keys | Wrapped per-stream data-encryption keys (at-rest encryption) |

**Configuration:**
- Compression: LZ4
- Write buffer: 64 MB x 3 buffers
- Max open files: 1000

### Tantivy

Full-text search index. Mirrors chunk content with additional indexed fields.

**Schema:**

| Field | Type | Boost | Purpose |
|-------|------|-------|---------|
| `id` | String | — | Chunk reference |
| `content` | Text | 1.0 | Primary search target |
| `entities` | Text | 0.2 | Entity mentions (comma-separated) |
| `relations` | Text | 0.2 | Relation triples |
| `stream` | String | — | Filtering |
| `timestamp` | i64 | — | Date filtering |
| `event_date` | i64 | — | Temporal queries |
| `level` | i64 | — | Tier filtering |

**Polish stemming** enabled for content field.

### Vector store

Embeddings stored in RocksDB's embeddings column family.

**Providers:**
- **Local (default)** — `multilingual-e5-small` (384 dimensions) via the tract ONNX runtime (pure Rust, no native dependencies, runs offline)
- **OpenAI** — `text-embedding-3-small` (1536 dimensions), opt-in via `embedding_provider = "openai"`

**Embedding queue:** batch processing (50 items, 5s flush interval) to amortize API latency.

### Intent log (WAL)

Write-ahead log ensures cross-store consistency between RocksDB and Tantivy.

```
1. Append: PENDING(op_type, chunk_id)
2. Write to RocksDB
3. Write to Tantivy
4. Append: COMMITTED(op_type, chunk_id)
```

On crash recovery:
- PENDING without COMMITTED → rollback partial writes
- COMMITTED → verify both stores have the data

---

## Ingest pipeline

Content is sanitized before any storage:

```
Input content
    │
    ▼  [sanitizer.rs]
 1. HTML tag stripping + entity decoding
 2. Instruction injection detection (18 patterns — logs warning, does not block)
    │
    ▼  [pii_filter.rs]
 3. PII redaction (phones, emails, PESEL, blocklist words)
    │
    ▼  [persist_chunk]
 4. RocksDB store (sanitized content)
 5. Entity extraction + graph population (stream-scoped)
 6. Tantivy indexing
 7. Embedding queue
 8. Contradiction detection
```

The sanitizer detects but does not block injection attempts — content is stored after stripping. PII redaction replaces sensitive data with `[PHONE]`, `[EMAIL]`, `[ID]`, `[REDACTED]` tokens before storage.

---

## Search pipeline

A query flows through multiple stages:

### 1. Query classification

```
"what do you know about me?"   → Profile
"how many projects do I run?"  → Aggregation (top_k boosted to 30)
"when did I change my IDE?"    → Temporal (date filtering)
"why did I choose Rust?"       → Complex (top_k = 20)
"my dog's name"                → Simple (top_k = 3, BM25 only)
```

### 2. Date filter extraction

Parses relative dates from query text:
- "last week" → `date_from: now - 7d`
- "in March" → `date_from: 2026-03-01, date_to: 2026-03-31`
- Explicit `date_from`/`date_to` params override

### 3. Parallel search

BM25 and vector search run concurrently:

**BM25 (Tantivy):**
```
QueryParser(content^1.0, entities^0.2, relations^0.2)
  + stream filter
  + date range filter
  → ranked results
```

**Vector (cosine similarity):**
```
query_embedding = embed(query_text)
for each stored embedding:
    score = cosine(query_embedding, chunk_embedding)
→ top-K by similarity
```

### 4. Fusion

```
normalized_bm25   = bm25_score / max_bm25
normalized_vector  = vector_score / max_vector
fusion_score       = 0.6 * normalized_vector + 0.4 * normalized_bm25
```

### 5. Time decay

```
age_days = (now - chunk.timestamp) / 86400
lambda   = { L0: search.decay.l0_lambda, L1: search.decay.l1_lambda }
decay    = e^(-lambda * age_days)
score    = fusion_score * decay
```

### 6. Graph enhancement

For top results, find related entities and add their connected chunks:

```
result "Cursor" → entity "Cursor" → edges → related entities
  → neighbor chunks added with score * boost_factor (0.3)
```

### 7. Deduplication

Collapse near-identical results (high cosine similarity between result contents).

### 8. Optional reranking

If enabled, top-20 candidates are re-scored by:
- ONNX cross-encoder (local, ~97ms/pair) — or
- Async LLM reranking with speculative cache

### 9. Implicit boost

Non-dry-run searches increment `access_count` and boost `importance` of returned chunks (capped at 1.5, 1-hour cooldown).

### 10. Response

```json
{
  "results": [
    {
      "chunk_id": "abc-123",
      "content": "User prefers Cursor over VSCode",
      "score_final": 0.87,
      "trace_info": {
        "level": "L1",
        "source": "consolidation",
        "is_latest": true,
        "access_count": 5
      }
    }
  ],
  "trace_metadata": {
    "total_results_before_topk": 42,
    "search_latency_us": 1200
  }
}
```

---

## Background workers

The scheduler orchestrates background jobs. All workers can be paused/resumed at runtime via `POST /admin/workers/pause|resume` (useful for eval runs). Pause state is an `AtomicBool` shared between `AppState` and `Scheduler` — does not survive restart.

### Consolidation (L0 → L1)

| Setting | Default |
|---------|---------|
| Interval | 5 minutes |
| Batch size | 200 chunks |
| Min age | 60 seconds |
| Min chunks | 3 per stream |
| Style | `observation` (granular facts) |
| Similarity threshold | 0.3 (cosine, for topic grouping) |

**Process:**
1. Scan unconsolidated L0 chunks
2. Group by stream (user isolation)
3. **Sub-group by topic similarity** — greedy clustering on embeddings (cosine threshold). Prevents unrelated facts from merging into one L1 chunk.
4. PII redaction (per sub-group)
5. LLM compression (gpt-4.1-mini) — one call per sub-group
6. Create L1 chunk with `source_ids` linking back to L0
7. Index in Tantivy + embedding queue
8. Mark L0 as `consolidated: true`

Chunks without embeddings fall back to a single group (pre-clustering behavior). Single-topic streams produce one sub-group — zero regression risk.

### Decay

| Setting | Default |
|---------|---------|
| Interval | 1 hour |
| L0 factor | 0.990 per hour |
| L1 factor | 0.995 per hour |
| Dormant threshold | 0.01 |
| Adaptive | enabled (ACT-R) |

**Adaptive decay:** chunks with high `access_count` decay slower (`adaptive_dampening` / `adaptive_cap` in `[worker.decay_worker]`).

### Clustering

| Setting | Default |
|---------|---------|
| Interval | 6 hours |
| Algorithm | k-means on L1 embeddings |
| Max iterations | 1000 |

Cluster output feeds the associator (below); there is no separate storage tier for clusters.

### Entity extraction queue

| Setting | Default |
|---------|---------|
| Flush interval | 3 seconds |
| Queue capacity | 200 |
| Confidence threshold | 0.7 |
| Model | gpt-4.1-mini |

Async LLM NER runs in background for entities not in the dictionary.

### Embedding queue

| Setting | Default |
|---------|---------|
| Batch size | 50 |
| Flush interval | 5 seconds |

### Associator (ECA — serendipity engine)

| Setting | Default |
|---------|---------|
| Interval | 6 hours (with clustering) |
| Min serendipity | 0.1 |
| Max associations | 3 per query |
| Mechanisms | graph walk, temporal, adjacent |

**Components:**
- **Clustering** — k-means on chunk embeddings, per-stream
- **Graph walk** — random walk with weak-tie preference (fewer shared chunks = more novel)
- **Temporal** — find chunks near the same time period
- **Serendipity scoring** — relevance × (1 - obviousness) × cluster distance

### Hard-purge (retention worker)

| Setting | Default |
|---------|---------|
| Interval | 24 hours (`retention.hard_purge_interval_secs`) |
| Retention window | 30 days (`retention.soft_delete_days`) |

Scans for soft-deleted chunks past their recovery window. Purge pipeline: graph references → Tantivy → RocksDB hard delete (chunk + embedding + entities + relations).

### Dream (auto-consolidation)

| Setting | Default |
|---------|---------|
| Auto-trigger | 30 min idle |
| Batch size | 50 chunks |
| Min group size | 2 |
| Cost cap | $0.10 / run |

---

## Authentication

Loomem is single-user: one API key controls access to the whole instance.

- The key is read from the env var named by `server.auth_token_env` in `config.toml` (default `LOOMEM_AUTH_TOKEN`).
- All requests require `Authorization: Bearer <key>`; `/health` remains open.
- If no key is configured, the server runs in **local passthrough mode** — every request is accepted with admin privileges. Use only for local development.

Data written without an explicit stream lands in the default stream `__user_default__`. Additional streams (from `[streams]` / `[namespaces]` in config) partition data within the same instance — they are an organizational boundary, not separate identities.

### OAuth 2.0

For MCP Remote Connector (claude.ai):
- Dynamic Client Registration (RFC 7591)
- Authorization Code flow with PKCE
- User enters API key during authorization
- Access token = API key (no extra token layer)

---

## MCP integration

Loomem implements the Model Context Protocol (MCP) as a JSON-RPC 2.0 endpoint at `POST /mcp`.

**Request flow:**

```
Claude (MCP client)
    │
    ▼
POST /mcp (JSON-RPC)
    │
    ▼
mcp::handler → parse request → extract tool + args
    │
    ▼
mcp::dispatcher → match tool name → call internal handler
    │
    ▼
loomem-core / handlers → execute → return ToolResult
    │
    ▼
JSON-RPC response → Claude
```

**Session management:** OAuth tokens map to sessions, sessions map to `stream_id` for data isolation.

---

## Crash recovery

1. **Intent log replay** — on startup, scan WAL for uncommitted operations
2. **Partial write detection** — check RocksDB and Tantivy for consistency
3. **Orphan cleanup** — remove chunks marked `in_progress` from failed consolidation
4. **Tantivy rebuild** — if schema version mismatch, rebuild index from RocksDB source of truth

---

## Cost tracking

Every LLM call (consolidation, extraction, embedding) is tracked:

```
[cost]
daily_cap_usd = 15.00           # Hard stop
alert_threshold_usd = 10.00     # Warning
anomaly_multiplier = 3.0        # 3x typical = anomaly alert
```

Costs persisted in RocksDB column family. Workers check budget before each LLM call.
