Cognee vs Hindsight: Which Agent Memory System Should You Self-Host?
Compare Cognee and Hindsight for AI agent memory. Architecture, retrieval strategies, integrations, benchmark performance, and when to use each one.

I’ve been running both Cognee and Hindsight for a few months now. They solve related problems but take very different approaches, and picking the wrong one for your use case will waste weeks of integration work.
Both are open source. Both go beyond basic RAG. Both give AI agents persistent, structured memory. That’s where the similarities end.
This article breaks down how they differ architecturally, what each one is actually good at, which integrations they support, and when you should pick one over the other (or both).
Quick comparison
| Cognee | Hindsight | |
|---|---|---|
| Architecture | Knowledge graph + vector search | Multi-strategy hybrid (semantic + BM25 + graph + temporal) |
| Primary strength | Data ingestion and knowledge extraction from documents | Agent interaction memory with multi-strategy retrieval |
| Memory type | Institutional (knowledge from your data) | Personalization + institutional |
| Data sources | 30+ connectors (PDFs, Slack, Notion, images, audio) | Agent conversations and structured input |
| Benchmark (LongMemEval) | Not ranked | 94.6% (state of the art) |
| SDKs | Python only | Python, TypeScript, Go |
| Protocol | REST API | MCP-first |
| Storage | SQLite + LanceDB + Kuzu (or PostgreSQL + pgvector) | PostgreSQL with pgvector |
| License | Open core | MIT |
| Web UI | CLI-only (cognee-cli -ui) |
Built-in Control Plane |
| GitHub stars | ~12K | ~4K (growing fast) |
How each one works
Cognee: turn your data into a knowledge graph
Cognee is a pipeline. You feed it data from various sources, it extracts entities and relationships, builds a knowledge graph, and layers vector embeddings on top. When you query it, the graph traversal and vector search work together.
The core loop:
- Ingest - Pull data from 30+ connectors (PDFs, Slack, Notion, databases, images, audio)
- Process - Chunk text, extract entities, resolve relationships
- Store - SQLite for metadata, LanceDB for vectors, Kuzu for the knowledge graph (or PostgreSQL + pgvector for all three)
- Query - Hybrid graph traversal + vector similarity
The API is straightforward:
import cognee
await cognee.remember("The production database runs PostgreSQL 17")
results = await cognee.recall("What database does production use?")
Cognee’s new API uses remember, recall, forget, and improve operations. The remember call stores data in the knowledge graph, recall queries it, and improve lets the agent learn from feedback.
Where Cognee shines: taking a corpus of documents, meeting transcripts, or code repositories and turning them into structured, queryable knowledge. If your agent needs to answer questions grounded in company docs, Cognee’s pipeline handles the extraction and graph construction.
Hindsight: learn from agent interactions
Hindsight takes a different approach. Instead of ingesting external data sources, it captures and structures what agents actually learn during operation, conversations, decisions, corrections, and outcomes.
It organizes memory into three categories:
- World facts - Things that are true (“The project uses PostgreSQL 17”)
- Experiences - Things that happened (“Last deployment broke because of a migration issue”)
- Mental models - Patterns formed by reflecting on facts and experiences (“This user prefers concise responses”)
When you store a memory with retain, Hindsight runs an LLM to extract entities, relationships, and temporal data. When you search with recall, it runs four retrieval strategies in parallel:
- Semantic search (vector similarity)
- Keyword matching (BM25)
- Graph traversal (entity and relationship links)
- Temporal filtering (time ranges)
Results get merged with reciprocal rank fusion and reranked. This multi-strategy approach is why Hindsight scores 94.6% on the LongMemEval benchmark, the highest of any agent memory system tested.
from hindsight_client import Hindsight
client = Hindsight(base_url="http://localhost:8888")
client.retain(bank_id="my-project", content="Alice prefers concise responses")
results = client.recall(bank_id="my-project", query="How should I talk to Alice?")
The reflect operation goes deeper. It pulls together related memories and generates new observations. An AI project manager could reflect on risks, a sales agent could reflect on outreach patterns, a support agent could reflect on gaps in documentation.
Architecture differences
Cognee uses three storage layers by default: SQLite for metadata, LanceDB for vectors, and Kuzu for the knowledge graph. You can swap these for PostgreSQL + pgvector (handling both relational and vector data) plus Neo4j or FalkorDB for graphs. This flexibility is useful but means more moving parts.
Hindsight uses a single PostgreSQL instance with pgvector. One database handles vectors, metadata, entity graphs, and temporal indexes. Simpler to deploy, simpler to back up, simpler to scale.
Cognee combines graph traversal with vector similarity. The knowledge graph structure enables multi-hop queries, following entity relationships across connected data points. Good for “What documents mention the same project as this meeting transcript?”
Hindsight runs four retrieval strategies in parallel and fuses results. Semantic search handles conceptual matches, BM25 catches exact terms, graph traversal follows entity links, and temporal filtering understands time expressions like “last week” or “before the migration.” No single strategy handles every query type, so the parallel approach covers blind spots.
Cognee has 30+ connectors: PDFs, Slack, Notion, Google Drive, SharePoint, databases, images (via vision models), audio (via transcription). If you need to ingest from external sources, Cognee is built for it.
Hindsight doesn’t connect to external data sources. It captures memory from agent interactions, conversations, decisions, corrections, and outcomes. You feed it information through the retain API, not through connectors.
Integrations
This is where the two diverge significantly.
Cognee integrations
Cognee is Python-only for the SDK. If your agent is written in TypeScript or Go, you’ll need to use the REST API or build a wrapper.
Available integrations:
- MCP server - Works with Cursor, Claude Code, and other MCP clients
- Claude Code plugin - Hooks into session lifecycle for automatic memory capture
- OpenClaw plugin -
cognee-openclawfor OpenClaw agent workflows - REST API - Full-featured API for any language
Hindsight integrations
Hindsight ships SDKs for Python, TypeScript, and Go. The MCP-first design means any MCP-compatible agent works without an SDK.
Available integrations:
- Claude Code - Hooks for automatic conversation capture and context recall
- OpenCode - Community plugin with auto-retain and session-start recall
- OpenClaw - Direct integration with server-side access control
- Hermes Agent - Memory backend for the Hermes multi-agent framework
- Agno - Direct integration for structured long-term memory
- Zed - Long-term memory for the Zed editor’s AI assistant
- Cursor - MCP server integration
- n8n - Community node for workflow automation
- LangChain/LangGraph - Memory Tools, Graph Nodes, BaseStore adapter
- LlamaIndex - BaseToolSpec and BaseMemory support
- Vercel AI SDK - Memory for AI SDK and AI Chatbot applications
- Pipecat - Memory for voice AI pipelines
- Dify - Plugin for chatflow and agent apps
- LiteLLM - Proxy callbacks for zero-code-change memory
- CrewAI, Pydantic AI, AutoGen, AG2, Strands - Framework-specific integrations
The breadth of Hindsight’s integrations is a direct result of the multi-language SDK support. Cognee’s Python-only approach limits its reach outside the Python ecosystem.
Benchmark performance
Hindsight holds the top score on the LongMemEval benchmark at 94.6%. This benchmark tests memory system performance across conversational AI scenarios, including:
- Recalling facts from past conversations
- Understanding temporal relationships
- Handling multi-hop reasoning across memories
- Maintaining consistency over long interaction histories
Cognee doesn’t have a published LongMemEval score. This doesn’t mean it’s worse at everything, it means the benchmark tests are focused on conversational memory, which is Hindsight’s strength. Cognee’s knowledge graph approach excels at different tasks, like extracting structured information from documents and reasoning across connected data points.
The benchmark matters if your use case is agent interaction memory. If your use case is document knowledge extraction, the benchmark is less relevant.
Self-hosting both
Both tools work well with Docker Compose. I have guides for each:
- How to Self-Host Cognee - Full setup with Dokploy or Docker Compose, PostgreSQL, pgvector, and MCP integration
- Deploy Hindsight on Docker - Complete deployment with PostgreSQL, pgvector, and authentication
You can run both
Cognee and Hindsight aren’t mutually exclusive. Use Cognee to build a knowledge layer from your document corpus and organizational data. Use Hindsight to give your agents memory of their runtime interactions and user-specific context. They complement each other.
When to use Cognee
Pick Cognee when:
- Your primary need is knowledge extraction from existing data. Thousands of PDFs, Slack threads, meeting transcripts, or code repositories that need to become structured, queryable knowledge.
- You work with multimodal data. Cognee processes images through vision models and audio through transcription, integrating extracted knowledge into the same graph.
- Your stack is Python-only. The Python SDK is clean and the “6 lines of code” claim holds for basic use cases.
- You want a local-first knowledge graph. SQLite + LanceDB + Kuzu runs entirely locally with no external dependencies.
- Reducing hallucinations is your top priority. The knowledge graph provides traceable paths from query to source.
When to use Hindsight
Pick Hindsight when:
- Your agent needs to learn from its own interactions. User preferences, past decisions, corrections, and contextual history across sessions.
- You need multi-strategy retrieval. Some queries need semantic similarity, others need exact keywords, others need temporal reasoning. Running all four in parallel covers each method’s blind spots.
- Your stack is multi-language. Python, TypeScript, and Go SDKs, plus MCP protocol support.
- You want MCP-native integration. Plug-and-play memory with any MCP-compatible agent.
- Temporal reasoning matters. Questions like “What changed since last deployment?” or “What did the user say about pricing before the Q3 review?”
- You want a web UI. Hindsight’s Control Plane gives you a browser interface for managing memory banks and testing queries. Cognee’s UI requires running a CLI locally.
Can you use both together?
Yes. They solve different problems:
- Cognee builds knowledge from your existing data (documents, transcripts, code)
- Hindsight captures knowledge from agent runtime (conversations, decisions, outcomes)
A practical setup: Cognee ingests your company documentation and builds a knowledge graph. Hindsight remembers what your agent learns while helping users. The agent queries Cognee when it needs institutional knowledge and Hindsight when it needs to remember user preferences or past interactions.
Both run on PostgreSQL with pgvector, so you can use the same database server if you want to keep infrastructure simple.
Cost comparison
Both are free to self-host. The main costs are:
- Infrastructure - A VPS with 4GB RAM handles both. PostgreSQL with pgvector is shared, so you’re not doubling your database costs.
- LLM API usage - Both need an LLM for extraction and reasoning. Cognee uses it for entity extraction during ingestion. Hindsight uses it for fact extraction during retain operations and for reflect. Costs scale with how much data you process.
- Embedding API usage - Both generate vector embeddings. Similar cost profile since both use the same embedding models (OpenAI, Cohere, etc.).
Using local models with Ollama can reduce API costs to nearly zero for both tools, at the expense of slower processing and potentially lower extraction quality.
The bottom line
Cognee is a knowledge extraction engine. Point it at your data, it builds a graph you can query. Hindsight is an interaction memory system. Let your agent use it, and it gets smarter over time.
If your agent forgets things users told it, pick Hindsight. If your agent can’t answer questions about your documentation, pick Cognee. If both problems sound familiar, run both.