Backup and Restore
What is backed up
RocksDB checkpoint — a point-in-time snapshot of all data: - All chunks (L0, L1) - Embeddings (stored in RocksDB column family) - Knowledge graph (entities, edges) - Cost tracking data - Wrapped per-stream encryption keys (when at-rest encryption is enabled)
Not in the checkpoint (auto-rebuilt on startup): - Tantivy full-text index — rebuilt from RocksDB if schema version mismatches - WAL/intent log — replayed on startup for crash recovery
Encrypted-instance backup is JOINT (master key required)
When at-rest encryption is enabled (LOOMEM_AT_REST_MASTER_KEY set), the checkpoint
contains only wrapped per-scope DEKs (in the keys column family) and ciphertext for
the encrypted row classes (chunk content, entity, relation, graph entity, audit). The RocksDB
checkpoint alone cannot decrypt them — it does not contain the master key that unwraps the
DEKs.
A usable backup of an encrypted instance is therefore joint:
checkpoint + a confirmed master-key escrow whose fingerprint matches the data.
- Record the fingerprint beside every checkpoint. Capture it at backup time so a future restore knows exactly which escrowed key the data was encrypted under:
bash
curl -s http://localhost:3030/v1/encryption/status \
-H "Authorization: Bearer <admin_token>" | jq -r '.master_key_fingerprint' \
> {data_dir}/backups/checkpoint-<ts>/master-key-fingerprint.txt
The fingerprint is a one-way digest — safe to store in plaintext beside the checkpoint. The key itself is never stored here; it lives only in escrow.
-
Automated (12h-worker) checkpoints are still covered. The fingerprint changes only when the key changes (rotation), so every checkpoint taken during a key's lifetime shares that key's single fingerprint. The escrow record (key + fingerprint + activation date) therefore identifies the correct key for any checkpoint by its timestamp, even one the worker wrote without a per-checkpoint fingerprint file. The
master-key-fingerprint.txtstep above is a convenience for fast lookup, not a recovery prerequisite; automating it as a post-backup hook is a possible future nicety, not a gap that blocks recovery. -
Without the matching master key, an encrypted checkpoint is unrecoverable. Routing, embeddings, and Tantivy stay readable, but every encrypted row class fails closed.
Keep at least one copy of the master key (with its fingerprint and activation date) in a
secret store that is separate from the data volume and the backups — a password manager,
cloud secret manager, or printed escrow. Verify after every rotation that the escrowed value
matches the running instance's fingerprint (GET /v1/encryption/status).
Where backups are stored
{data_dir}/backups/checkpoint-{YYYY-MM-DD-HHMMSS}/
- Local:
./data/backups/ - Container deployments:
/data/backups/(on the persistent volume)
Configuration
[worker.backup]
enabled = true
interval_secs = 43200 # every 12 hours
max_copies = 2 # keep last 2 (1 day coverage)
Manual backup
Checkpoints are consistent point-in-time copies; for an out-of-band backup simply copy the
latest {data_dir}/backups/checkpoint-* directory (or snapshot the volume) — this is safe
while the server is running.
Restore procedure
1. Stop Loomem
pkill -f loomem-server
2. Replace RocksDB data
# Move current data aside
mv {data_dir}/rocksdb {data_dir}/rocksdb.old
# Copy checkpoint
cp -r {data_dir}/backups/checkpoint-2026-04-04-120000 {data_dir}/rocksdb
3. (Encrypted instances) Confirm the master key is set
If the checkpoint is from an encrypted instance, the matching master key must be present
before start, or every encrypted row fails closed. Restore LOOMEM_AT_REST_MASTER_KEY
from escrow and confirm its fingerprint matches master-key-fingerprint.txt saved beside the
checkpoint (after start, GET /v1/encryption/status reports the active fingerprint). For
plaintext instances (no key), skip this step.
4. Start Loomem
./target/release/loomem-server
On startup: - RocksDB opens from the checkpoint - Tantivy detects schema mismatch → auto-rebuilds index from RocksDB - WAL replays any pending operations - All data is restored
5. Verify
curl http://localhost:3030/v1/status
Check that rocksdb_keys, tantivy_docs, and embeddings_count match expected values.
Data retention and hard purge
Deleted memories are soft-deleted (marked with deleted_at timestamp) and remain in storage for a recovery window. After the window expires, the hard-purge worker permanently removes them.
[retention]
soft_delete_days = 30 # 30-day recovery window
hard_purge_interval_secs = 86400 # purge worker runs daily
Hard purge removes: chunk data, embeddings, entity/relation metadata, and graph references. Purged data cannot be recovered from backups made after the purge ran.
To recover a soft-deleted memory before purge: restore from a backup taken before deletion.
Recovery metrics
| Metric | Value |
|---|---|
| RPO (Recovery Point Objective) | 12 hours (backup interval) |
| RTO (Recovery Time Objective) | ~5 minutes (copy + restart + Tantivy rebuild) |
Notes for container platforms (e.g. Railway)
- A persistent volume mounted at
/datasurvives redeployments - Backups live inside the same volume — single point of failure
- For off-site backup: use your platform's volume snapshot feature, or periodically download a checkpoint
- Before any destructive migration, verify a recent backup exists