The architecture isn’t designed — it’s derived. Start from a small set of axioms about what memory fundamentally is, and the shape of memeX is forced by them.
The 9 axioms
1 · Bounded vs unbounded
The LLM's effective state is bounded; the work it reasons about is unbounded. Memory exists to bridge this gap.
2 · Multiple types
Episodic (events), semantic (facts), procedural (how-to). Collapsing them into one store is the original sin of every RAG system.
3 · Memory has a lifecycle
Encode → store → retrieve → consolidate → reconsolidate → forget. Most systems implement only the first three.
4 · Truth has a source + timestamp
A fact without provenance is a poison pill in a multi-agent system.
5 · Falsifiability
A claim that can't be falsified isn't knowledge — it's opinion.
6 · Working set
Cognition has global workspace + long-term store. Retrieval without a working set is amnesia.
7 · Repetition creates primitives
Patterns that recur become atomic units (chunking).
8 · Time erodes truth
Stale memory is worse than no memory.
9 · Retrieval cost wins
Memory's value is retrieval cost, not storage cost. Hoarding has negative utility.
The dual metaphor: RAM hardware + human memory
memeX is shaped jointly on the structure of RAM and on what cognitive science says about human memory.
| RAM hardware | Human memory | memeX |
|---|---|---|
| Cell (smallest addressable unit) | Engram | Concept node |
| Bank (parallel storage region) | Memory system | stores/episodic, stores/semantic, stores/vector |
| Row addressing | Cued recall | recall(query) |
| Refresh cycle (DRAM autorefresh) | Reconsolidation | last_confirmed_at + working-set touch() on retrieval |
| ECC (error correction) | Pattern separation | NLI / conflict detection |
| Cache hierarchy (L1/L2/L3) | Working memory | WorkingSet (LRU cache) |
| Bus bandwidth | Attention budget | token-budget enforcement |
| Read/write ports (parallel) | Sensory channels | MCP + HTTP + CLI frontends |
| Wear leveling | Consolidation (episodic → semantic) | lifecycle/consolidation.py |
| DMA (direct CPU bypass) | Priming | MCP stdio (zero-copy editor → engine) |
| Page table (address translation) | Naming/aliasing | concept IDs + same_as edges |
What’s forced by the axioms
Multi-store, not single graph (axiom 2)
Three separate stores with different schemas, different decay rates, different retrieval defaults:
- Episodic: time-indexed event log (sessions, conversations, incidents)
- Semantic: typed property graph (concepts, decisions, contracts)
- Procedural: ordered workflows (deploys, runbooks)
Multi-encoding per fact (axiom 3)
Every node stored as: text, embedding, graph node, and an executable predicate. Retrieval picks encoding by query type.
Provenance-or-die (axiom 4)
Every node and every edge carries source, created_at, last_confirmed_at, confidence. The schema rejects nodes without these.
Falsifiability requirement (axiom 5)
Every node has an optional verification predicate. Nodes without verification are downgraded over time. This prevents drift.
Working set as first-class structure (axiom 6)
WorkingSet is a bounded LRU cache that biases retrieval ranking — recently touched concepts get higher rank (priming).
Chunking via co-retrieval (axiom 7)
Track which nodes are retrieved together. When co-retrieval frequency crosses a threshold, the subgraph compresses into a chunk node.
Time-aware confidence + active forgetting (axiom 8)
Confidence decays on an Ebbinghaus exponential (default half-life 90 days) unless re-confirmed. Active forgetting deletes nodes that meet stale-and-orphaned criteria.
Retrieval is budget-bounded by construction (axiom 9)
Every retrieval is (query, token_budget) → minimal subgraph. There is no “give me everything.” The API doesn’t allow unbounded retrieval.
Foundational data structures
Each layer has a CS-foundational data structure underneath:
- Property graph — typed nodes + typed edges, primary key indexing
- Inverted index (
rank_bm25) — token → docs mapping for keyword retrieval - Brute-force k-NN with cosine (numpy) — at v1 scale; HNSW upgrade is a Protocol-conforming swap
- LRU cache (
OrderedDict) —WorkingSet, the L1 layer - Time-indexed btree — episodic events, ordered by timestamp
- Reciprocal Rank Fusion — combining BM25 + vector ranks; provably parameter-stable (
k=60) - Exponential decay — Ebbinghaus forgetting curve,
confidence(t) = base * 0.5^(days/half_life)
Software architecture: SOLID with Protocols
- Repository pattern with Protocol-based interfaces — every store defined as a Protocol so future swaps don’t require core changes.
- Constructor dependency injection —
Enginetakes its stores + retriever + provider as constructor args. - Factory method —
Engine.build_default()for the conventional wiring. - Strategy pattern — retrieval strategies (BM25, hybrid) interchangeable.
- Provider pattern — embeddings as
NoOpEmbeddingProvider+FastEmbedProvider, swappable at runtime. - Façade —
Engineas the single entry point; frontends never reach into stores. - Layered architecture — frontends → engine → core → stores.
- Lazy loading — heavy resources (embedding models) load on first use.
- Fail loud — HTTP daemon refuses non-localhost binding without auth.
Polyglot by design
The Protocol-based interfaces mean any layer can be reimplemented in any language and dropped into the same architecture. v1 ships all-Python because it’s the fastest path to correctness; future hot paths can be replaced module-by-module.
The HTTP API is the universal cross-language boundary. OpenAPI spec at /openapi.json autogenerates bindings.
Watchful, not passive
memeX isn’t a database the agent queries — it’s an active observer:
- Every
add/link/recall/validateauto-emits an episodic event. progress()exposes the full activity log.- The AI can introspect what it’s already done and avoid repeating itself.
- Future versions detect contradictions between memories and surface them.