Meet Memory OS: a 6-layer open-source memory stack built on top of Hermes Agent

Hermes Agent already remembers all sessions. Nous Research’s open-source agent comes with curated memory files and full-text session search. But a new community project argues that the built-in memory is too shallow for serious work. A new library named ‘memory os‘ is released under the MIT license by a developer (claudiodruze). It stores six memory layers on Hermes. It adds a vector database, structured facts, and an auto-curated knowledge wiki. The project is new but shows good promise and its architecture shows how agent memory can be layered.

memory os

Memory OS is not a Hermes plugin that you toggle on. It’s a layered system that sits next to the Hermes agent’s own memory. Hermes already provides workspace files and a session database. The memory OS keeps them and adds four more layers on top of them. The full stack runs locally using Docker, Quadrant, Redis, and Python 3.11+. It works with any Hermes-supported LLM provider, including OpenRouter, OpenAI, Anthropic, and Olama. The README presents it as a “memory operating system”, not as a single feature.

Six layers, from files to vectors

Layer 1 is the workspace. In this, MEMORY.md, USER.md and CREATIVE.md are injected into the system prompt at each turn.
Layer 2 is session. It uses state.db, a SQLite database that contains FTS5 full-text search across conversation histories.
Layer 3 is structured facts. It stores durable facts in Memory_store.db using SQLite, HRR, FTS5 and trust scoring. A feedback loop adjusts those trust scores over time as the entity resolves.
Layer 4 is Fabric, which is a heavily forked version of the Icarus plugin. This fork adds LLM-powered session extraction to the upstream esaradev/icarus-plugin. It handles cross-session recall through 16 tools, including fabric_recall, fabric_write, and fabric_brief.
Layer 5 is the vector database, built on Quadrant. It uses 4096D cosine vectors plus BM25 sparse search, a keyword-style ranking method.
Layer 6 is an LLM wiki, an auto-curated vault of concepts, entities and comparisons. That wiki is continuously fed back into the quadrant through a process called wiki-continuous-ingest.

How recovery flow works

The flow continues when memory is read and written. But pre_llm_callMemory OS runs what is called surgical recall. It is derived from four sources simultaneously: Fabric, Quadrant, Sessions, and Facts. Each source is determined by a relevance threshold before anything reaches the model. Per-session deduplication prevents the same context from appearing twice. A social-closure filter skips simple messages like a plain “Thank you.” But post_llm_call And on_session_endThe system automatically extracts and captures new learnings. The stated goal is token efficiency, not filling the context window.

Fallback Cascade and Cleanup

Layer 5 recovery uses a four-level fallback. It tries hybrid search first, then dense vector, then lexical, then SQLite. If one method fails or returns nothing, the next method takes over. This keeps Design Recall working even when the vector database struggles. Memory OS also runs a weekly decay scanner to check for old entries. When cosine similarity exceeds 0.92, semantic dedup merges nearly similar memories. The purpose of these housekeeping steps is to prevent the memory from swelling over months of use.

Local-first, and intentionally so

Memory OS positions itself against cloud memory services like mem0, Zep, and Letta. The pitch is that the memory infrastructure should run on your own machine. Memory data remains local without any memory subscription. LLM calls still go through to whatever provider you choose. Hermes itself already supports eight external memory providers, including Mem0 and Honcho. Memory OS is not one of those official providers. It is a separate, community-built stack based directly on Hermes. For teams with data-residency rules, a local memory store may make sense.

Strengths and limitations

Strength:

Clear layered design separating files, sessions, facts, vectors and a wiki
Completely local infrastructure with no cloud storage subscriptions
Provider-agnostic, matching Hermes Agent’s own flexibility
By design, token-efficient recovery through gated sources and per-session deduplication

boundaries:

Brand new, with few commitments
A forked Icarus plugin that the author says is not upstream-compatible
Heavy setup: Docker, Quadrant, Redis and an ARQ worker are all required
No published benchmarks on recall quality, latency, or token savings

key takeaways

Memory OS is a community-built, MIT-licensed stack that adds six memory layers on top of the Hermes Agent.
It combines workspace files, FTS5 session search, trust-derived facts, a forked Icarus fabric, quadrant vectors, and an auto-curated LLM wiki.
recovery continues pre_llm_call Gated from four sources, with duplicate recalls; capture is on post_llm_call And on_session_end.
The memory infrastructure is completely local and provider-agnostic, but LLM calls still go to your chosen provider.

check it out repo. Also, feel free to follow us Twitter And don’t forget to join us 150k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Do you need to partner with us to promote your GitHub repo or Hugging Face page or product release or webinar, etc?join us

The post Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent appeared first on MarkTechPost.