Projects7 min read

Context Is the Moat: Why I Built My Own AI Memory Layer

After a year of building seriously with AI assistants, I came to a conclusion that took longer to articulate than it did to feel: the real competitive advantage in AI-augmented work is not which model you use, it is the accumulated context you bring to it. I built Sulci to own mine.

The immediate trigger was mundane. I had explained my preferred tech stack to the same AI assistant for the tenth time in the same project, always starting cold, losing not just preferences but decisions that had taken weeks to reach, constraints with real reasons behind them, and hard-won understanding of why certain approaches had already failed. The data was all there somewhere in previous conversation logs. The reasoning was gone.

What context actually is

Context is not the same as information, which is merely retrievable: it is the accumulated weight of decisions made, preferences established, lessons absorbed, and constraints understood. Think of the difference between a colleague who joined last week and one who has worked alongside you for two years; both might have access to the same documentation, but only one knows why the database schema looks the way it does, which architectural decisions are sacred and which are just legacy, and what kind of feedback actually lands with you versus what bounces off.

This is what was missing in every AI interaction I had: not knowledge, but context. And unlike knowledge, which you can retrieve from a document, context can only be built through accumulated experience of working with someone, which for an AI system means storing and retrieving the output of previous interactions in a structured way.

Models are becoming commodities. Context is not.

In 2023, access to a frontier language model felt like a genuine differentiator. By 2025, GPT-4 class capability was available from a dozen providers at a few cents per million tokens, and open weights models were closing the gap fast. The model itself was on a clear path to commoditization: good enough, cheap enough, and interchangeable enough that which specific model you used would matter less and less.

What does not commoditize is the layer above the model: the fine-tuning, the system prompt, the retrieved context, the accumulated history of interactions. An enterprise that has spent two years logging decisions, preferences, and institutional knowledge into a structured context layer has something a competitor who just signed up for the same API does not, regardless of which model either of them uses.

I think this is where the most significant near-term advances in practical AI usage will come from, and not from bigger models or faster inference but from better context architecture: who owns the context, how it is structured, how it is retrieved, and how it is kept accurate over time. These are the questions that will separate good AI-augmented work from genuinely compounding AI-augmented work.

Why you should own your context, not rent it

Most AI platforms that offer memory features store your context on their infrastructure, in their format, under their terms, with no easy way to export it, port it to a different model, or inspect what they are actually storing about you. The context you build through months of work with their system becomes a lock-in mechanism, so that switching providers means starting over.

This bothered me more the longer I thought about it. The context I accumulate through my work, the decisions I make, the preferences I develop, the institutional knowledge I build up, is mine. It reflects months of thinking and experience, and surrendering it to a platform whose incentives may diverge from mine at any point seemed like exactly the wrong trade. Sulci was partly built as a response to that: a context layer I own, can inspect, can back up, and can port to any model I want to use.

How it works

Sulci stores knowledge as structured atoms in a dual-indexed store: Postgres for structured queries, pgvector for semantic search. Each index does what it does best, because a vector store finds semantically similar content but cannot answer 'give me all standing instructions' or 'what expired this week,' while a relational store handles those queries but cannot rank by semantic similarity. Both run on the same underlying data. There are seven atom types: Fact, Decision, Preference, Entity (people, projects, tools, organizations), Relationship (connections between entities), Context, and Instruction. Each type influences model behaviour differently: a standing instruction overrides what the model would otherwise do, a preference nudges it, a fact informs it, and an entity or relationship builds the associative map of your working environment.

At the start of each session, Sulci embeds the incoming query and retrieves the top-k most semantically relevant atoms using cosine similarity, injecting those atoms into the system prompt before the model sees any user message. The model starts each session already knowing the things most relevant to what you are about to work on.

Full-context injection, prepending everything the system knows about you to every prompt, does not scale: as the store grows you hit context limits, and more importantly you introduce noise, because the model does not need to know your preferred commit message format when you are asking about a data model. Semantic retrieval means the right context arrives at the right time.

Instructions require explicit conflict detection at write time, because two contradictory instructions cannot both sit in the store and be expected to sort themselves out at inference time. When you write a new instruction that contradicts an existing one, Sulci flags the conflict and requires you to resolve it before storing. There is also a CORE flag, separate from the atom type system: atoms marked CORE bypass retrieval entirely and are injected into every session regardless of query, which matters for standing constraints and non-negotiable preferences that should always be in context even when they are not semantically close to the task at hand. Atoms also carry temporal validity: expired atoms are automatically excluded from all context responses, which is useful for sprint goals, code freezes, and time-limited constraints. When a decision changes, a new atom can supersede the old one, which is then marked expired on save.

What it changed in practice

The most immediate change was that I stopped re-explaining myself. The model knows my project structure, knows the decisions I made three months ago and the reasoning behind them, knows which approaches I have already tried and rejected, and sessions start in the middle of a conversation rather than at the beginning of one. For any work that spans multiple sessions, which is almost all real work, that compounds quickly.

The second change was fewer correction loops. When the model knows I prefer server components over client components, it does not write a client component that I then have to correct, and when it knows we chose a particular database because of specific query requirements, it does not suggest alternatives that would break those requirements. The correction loop is where most of the time in AI-assisted development actually goes, and shortening it matters more than making individual responses faster.

The third change was harder to quantify but more significant: a genuine sense of continuity, of working with something that knows you rather than something that meets you fresh every time, which changes how you approach the work itself. You start treating the system as a collaborator rather than a tool you brief repeatedly, and what becomes possible with that continuity is different in kind, not just in speed.

Sulci integrates with the tools you already use: an MCP server for Claude Desktop and Claude Code with eight dedicated tools (query_context, add_knowledge, record_interaction, list_knowledge, delete_knowledge, list_projects, list_conflicts, resolve_conflict), a Chrome extension that passively captures conversations from ChatGPT, Claude, Gemini, and Perplexity without any manual steps, a VS Code extension that surfaces relevant knowledge as you work, an API proxy that intercepts OpenAI and Anthropic API calls with zero code changes required, and a REST API for any other integration. Sulci Local runs entirely on your own machine. Sulci Cloud adds multi-tenant support via Supabase Auth with full data isolation per user. The full project page has the complete architecture breakdown.

Let's Talk

Let's build something
worth talking about.

I take on a limited number of advisory and fractional engagements. Only projects where I can make a real difference. If you're navigating growth, AI, or revenue challenges in a technical B2B environment, let's talk.