Person OS for your static exports.
In 2019, Simon Willison started shipping a family of importers called Dogsheep. Each one pulled a stream of your data — tweets, stars, HealthKit, check-ins — into local SQLite. You ran Datasette over the file and got a browseable view of your own digital life. Obvious in retrospect. Your archives should be local. Your archives should be queryable. Your archives should outlive the login that produced them.
Seven years later, AI agents exist. They speak MCP. They walk graphs, cite sources, and run small scripts. So I built Hypha: Dogsheep for AI agents.
What Hypha is
A local-first library that takes your own data, normalizes it into a typed temporal graph, and lets an agent navigate inside it over MCP. The agent sees your archive the way Datasette let humans see it — except now an agent can walk the graph, cite sources, and do everything Datasette’s UI did, but for AI.
Why the archive
Every “memory for agents” product I respect — Mem0, Zep/Graphiti, Supermemory, Cognee, Letta — bets on live API connectors. OAuth into Gmail, OAuth into Slack, OAuth into Notion. Reasonable. Also wrong for the hardest cases.
- Kids leave school districts — the PowerSchool CSV doesn’t.
- Employees leave companies — the M365 ZIP sits in an ex-employee’s Downloads folder forever.
- Founders wind down SaaS — the Slack export survives.
- Journalists get handed a leaker’s Takeout — and the Takeout is all they’ll ever have.
The archive is the source of truth because logins rot. Companies get acquired. APIs get deprecated. Accounts get deleted. The Gmail mbox and the PowerSchool CSV do not. Once a fact is in Hypha, it no longer cares whether you still have access to the platform that produced it.
The model
Two primitives, and that’s the whole data model:
Node { id, kind, at, ingested_at, adapter, external_id, title, body?, facets? }
Edge { id, kind, from_id, to_id, at, weight? }
kind is an open string — "gmail.message", "identity.email", "file.document", "person". Adapters declare their kinds in a YAML manifest; core does not know what a “message” is.
Every record is bitemporal from day one. Four timestamps: tx_created (when Hypha learned it), tx_invalidated (NULL = currently believed), valid_from, valid_to. “Show me what I believed on date X, about facts that were true on date Y” is not academic when you’re ingesting 2019 exports in 2026 — it’s required.
Every record also carries provenance as an indexed column: kind: "ingested" | "inferred", plus adapter or inferrer, confidence, and inputs. Queries take include_inferred and min_confidence. The why tool walks the derivation tree back to ingested leaves. You can always see where a fact came from.
What’s shipped
v0.1.0-alpha, today, Apache-2.0:
- Adapters —
gmail-mbox(streaming mbox parser, content-addressed ids),google-drive-folder(recursive walk). - Inferrers —
identity-resolver(three-stage cascade: multi-key blocking → Fellegi-Sunter-inspired scoring with Jaro-Winkler → WCC clustering at ≥0.80),dlp-scanner(SSN, email, phone, Luhn-validated credit card, IBAN). - MCP tools —
search,neighborhood,timeline,why,fetch,record,ask(Haiku compiles natural language to structured StoreQuery whenANTHROPIC_API_KEYis set; FTS fallback otherwise). - Surfaces — stdio MCP server (
hypha serve), read-only HTTP (hypha publish), Graphiti-compatible export/import. - CLI —
hypha ingest | infer | search | serve | publish | export | import | build-adapter. - SQLite + FTS5 + sqlite-vec. Cedar policy + audit log scaffolded. 27 tests passing, 1 skip (sqlite-vec), 0 fail.
What this isn’t
Honest non-goals, so you know what not to expect from alpha:
- Not a live-API connector hub. No OAuth into Gmail in v1. You point it at the mbox.
- Not cloud. Everything runs on your laptop against a local SQLite file.
- Not a UI. The Constellation web UI is explicitly deferred. Alpha ships the library, CLI, and MCP server.
- Not a notes app, a screen recorder, or enterprise search. Those categories are full. Hypha does one thing: turns your static archives into an agent-navigable typed temporal graph, locally.
Credits
This stands on shoulders. Dogsheep set the bar and named the aesthetic. Graphiti / Zep proved the bitemporal-provenance shape. Splink is the Fellegi-Sunter reference. Presidio taught me what a DLP regex bank looks like. sqlite-vec made the vector story tenable on a single file. The MCP spec made all of this agent-queryable instead of a 2019 local Datasette.
Try it
git clone https://github.com/nalediym/hypha && cd hypha
bun install
bun run hypha ingest gmail-mbox ./your-archive.mbox
bun run hypha infer identity-resolver
bun run hypha search "dinner plans"
bun run hypha serve # attach from Claude Desktop
Apache-2.0. PRs welcome. The deferred adapters — Takeout, Slack, M365, Notion — are all tractable. Open an issue if you want to pair on one.