Loomem Docs
Docs

API Reference

Base URL: http://localhost:3030 (or your deployment URL)

Authentication: Authorization: Bearer <token> header on all endpoints except /health. The token is the single API key configured via the env var named by server.auth_token_env (default LOOMEM_AUTH_TOKEN); if no key is configured the server runs in local passthrough mode and accepts all requests.

When no stream is specified, data is read from and written to the default stream __user_default__.


Storage

POST /v1/store

Store a memory chunk.

Request:

{
  "content": "User prefers dark mode in all tools",
  "stream": "100",
  "level": 0,
  "metadata": { "source": "user-stated" },
  "user_id": "alice",
  "app_id": "claude"
}
Field Type Required Default Description
content string yes Memory text (1 – 1M chars)
stream string no __user_default__ Namespace ID
level int no 0 Memory tier (0 = raw, 1 = consolidated)
metadata object no null Custom JSON metadata
user_id string no null Creator identifier
app_id string no null Application identifier

Response:

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "stored"
}

Side effects: - Pre-ingestion sanitization (HTML stripping, instruction injection detection) - PII redaction (phones, emails, PESEL, blocklist words) - Entity extraction (dictionary + LLM queue) - Embedding generation (async queue) - Tantivy indexing - Graph population (stream-scoped) - Surprise scoring (importance adjustment) - Contradiction detection against existing memories

LLM-based knowledge extraction from full conversation transcripts is available via the memory_ingest MCP tool — see MCP Tools Reference.


POST /v1/search

Hybrid search across all memory tiers.

Request:

{
  "query": "What IDE does the user prefer?",
  "top_k": 5,
  "stream": "100",
  "date_from": "2026-01-01",
  "date_to": "2026-04-03",
  "trace": true,
  "dry_run": false
}
Field Type Required Default Description
query string yes Search query
top_k int no 10 Max results
stream string no __user_default__ Namespace filter
streams string[] no null Multi-namespace search
date_from string no null ISO date lower bound
date_to string no null ISO date upper bound
entity string no null Filter by entity name
trace bool no false Include debug trace info
dry_run bool no false Skip implicit boost / access tracking
valid_at int (unix sec) no null Bitemporal time-travel: return only chunks whose [valid_from, valid_until] covers this timestamp. Open intervals (None) are unbounded on that side.
include_superseded bool no false Include old versions
fact_type string no null Filter: preference, project, fact
subject string no null Filter by subject entity
min_confidence f64 no null Minimum extraction confidence

Response:

{
  "results": [
    {
      "chunk_id": "abc-123",
      "content": "User switched from VSCode to Cursor (reason: speed)",
      "score_final": 0.87,
      "trace_info": {
        "level": "L1",
        "source": "consolidation",
        "is_latest": true,
        "created_at": 1743638400,
        "memory_type": "static",
        "importance": 1.2,
        "access_count": 3,
        "version": 2,
        "superseded_by": null
      }
    }
  ],
  "trace_metadata": {
    "total_results_before_topk": 42,
    "dedup_removed": 3,
    "search_latency_us": 1200,
    "query_complexity": "medium"
  }
}

POST /v1/context-pack

Smart context packing for system prompt injection. Assembles a token-budgeted context window from profile, relevant memories, and recent activity.

Request:

{
  "query": "working on dashboard project",
  "stream": "100",
  "budget_tokens": 2000,
  "sections": ["profile", "relevant", "recent"]
}
Field Type Default Description
query string null Topic focus
stream string __user_default__ Namespace
budget_tokens int 4000 Max tokens in response
sections string[] all Which sections to include

Section token allocation: - profile: 20% of budget - relevant: 50% of budget - recent: 30% of budget

Response:

{
  "context": "## Profile\nUser is a software engineer...\n\n## Relevant\n...\n\n## Recent\n...",
  "sources": [
    { "chunk_id": "abc-123", "score": 0.87, "section": "relevant" }
  ],
  "total_tokens": 1856,
  "sections_included": ["profile", "relevant", "recent"],
  "sections_truncated": []
}

Memory management

POST /v1/delete

Delete a single chunk by ID.

{ "id": "abc-123" }

Response: { "status": "deleted", "id": "abc-123" }

DELETE /api/memories/{id}

REST-compliant delete. Optional ?ns=<namespace> query parameter.

Response: { "deleted": true, "id": "abc-123" }


POST /v1/purge-namespace

Delete all chunks in a stream.

{
  "stream": "100",
  "dry_run": true,
  "confirmed": false
}

Response:

{
  "status": "ok",
  "stream": "100",
  "dry_run": true,
  "deleted_count": 42,
  "deleted_ids": ["abc-123", "def-456", "..."]
}

Set dry_run: false and confirmed: true to actually delete.


Graph

GET /v1/graph/entity/{name}?stream={stream_id}

Get entity node with edges and chunk references. Requires stream parameter — graph is per-stream isolated.

{
  "entity": {
    "id": "ent-123",
    "name": "Cursor",
    "type": "TECHNOLOGY",
    "aliases": ["cursor IDE"],
    "chunk_count": 2
  },
  "neighbors": [
    { "entity": "Alice", "entity_type": "PERSON", "relation": "uses", "direction": "incoming" }
  ],
  "chunk_ids": ["abc-123", "def-456"]
}

GET /v1/graph/stats

{
  "total_entities": 42,
  "total_edges": 87,
  "avg_chunks_per_entity": 3.2
}

POST /v1/build-graph

Rebuild entity graph from all stored entities. Useful after bulk ingestion.

POST /v1/extract-entities

Trigger LLM NER backfill on chunks missing entity extraction. Runs asynchronously.


Synthesis

Version history of a chunk is available via the memory_history MCP tool; the synthesized user profile via the memory_profile MCP tool. See MCP Tools Reference.

GET /v1/generate-memory-md

Generate a MEMORY.md proposal from top chunks.

{
  "proposal": "# Memory\n\n## Identity\n- Software engineer...\n\n## Preferences\n...",
  "metadata": { "chunks_considered": 200, "sections": 15 }
}

Consolidation & maintenance

POST /v1/dream

Trigger dream consolidation on current stream.

{
  "stream": "100",
  "chunks_processed": 50,
  "groups_found": 12,
  "facts_merged": 8,
  "contradictions_resolved": 2,
  "cost_usd": 0.04,
  "duration_ms": 3200
}

Memory quality analysis is available via the memory_reflect MCP tool.

POST /v1/boost

Boost a chunk's importance to 1.5.

{ "id": "abc-123" }

POST /v1/embed-missing

Backfill embeddings for chunks that don't have them.

{
  "status": "ok",
  "total_missing": 15,
  "embedded": 15,
  "failed": 0
}

POST /v1/retag-all

Re-extract entities on all chunks.

POST /v1/score-all

Recompute importance scores for all chunks via embedding similarity.

POST /v1/re-embed-all

Regenerate all embeddings (useful when switching providers).

POST /v1/reset-importance

Reset all accumulated implicit boosts back to defaults.

POST /v1/reset-backfill

Clear LLM entity extraction markers to allow re-processing.


Admin

GET /v1/status

Engine health and statistics.

{
  "status": "ok",
  "uptime_secs": 86400,
  "config_summary": {
    "vector_enabled": true,
    "tantivy_enabled": true,
    "scheduler_enabled": true,
    "rocksdb_keys": 311,
    "tantivy_docs": 115,
    "embeddings_count": 112
  }
}

GET /v1/namespaces

{
  "namespaces": {
    "personal": "100",
    "work": "110"
  }
}

GET /v1/whoami

Returns the auth context for the current caller (active stream, role, accessible streams).

GET /health

Liveness check. No authentication required.

{ "status": "ok" }

Ambient memory

POST /v1/ambient returns a small, token-budgeted set of plain-fact memory snippets for injection into an agent's context at the start of a turn. See the ambient endpoint contract.


Background workers

POST /admin/workers/pause

Pause all background workers (consolidation, decay, compaction, backup, clustering, purge, stats). Useful for eval runs where you need a frozen database.

Requires admin token.

Response:

{
  "paused": true,
  "message": "All workers paused"
}

POST /admin/workers/resume

Resume all background workers.

Response:

{
  "paused": false,
  "message": "All workers resumed"
}

GET /admin/workers/status

Check if workers are paused.

Response:

{
  "paused": false
}

Pause state does not survive server restart — workers resume automatically on startup.


MCP endpoint

POST /mcp

MCP JSON-RPC 2.0 endpoint. Used by Claude Desktop, claude.ai connectors, and other MCP clients. The server identifies itself as loomem-memory.

See MCP Tools Reference for the 14 available memory_* tools.

OAuth endpoints

For MCP Remote Connector authorization:

POST  /oauth/register      Dynamic Client Registration (RFC 7591)
GET   /oauth/authorize      Authorization page (user enters API key)
POST  /oauth/authorize      Submit authorization
POST  /oauth/token          Exchange code for access token
Loomem · Apache-2.0Edit on GitHub