OpenExp — Self-labeling experience engine for AI agents

Session outcome	Reward
Code committed	+0.30
Pull request created	+0.20
Deployed to production	+0.10
Tests passed	+0.10
Deal closed (CRM)	+0.80
Nothing produced	-0.10

Feature	OpenExp	Mem0	Zep	LangMem
Learns from outcomes	Q-learning	No	No	No
Process-aware	Pipeline stages + signals	No	No	No
Memory type filtering	Reward only decisions	No	No	No
Hybrid retrieval	5 signals	Vector only	Graph + vector	Vector only
Claude Code native	Zero-config hooks	Integration required	Integration required	Integration required
Fully local	Qdrant + FastEmbed	Cloud API	Cloud or self-hosted	Cloud API

FAQ

Real questions from developers, founders, sales teams, and skeptics.

Installation & Setup

How long does installation take?

If you already have Docker and Claude Code — realistically 5 minutes. Clone the repo and run ./setup.sh — the script creates a venv, starts Qdrant in Docker, creates the collection, copies .env, and registers the MCP server and hooks in Claude Code. Requires Python 3.11+ and Docker. No API key needed for core functionality — embeddings run locally via FastEmbed. First launch downloads the model (~1 min), then it’s cached.

I’m not a programmer. Can I install this myself?

Honestly — it’ll be tough on your own. Best option: ask whoever set up Claude Code for you to spend 15-20 minutes. After installation everything runs in the background — you don’t do anything extra, just work as usual.

How much disk space does it use?

Budget 500MB-1GB on startup (Qdrant Docker image + embedding model). Memories themselves are tiny: 10,000 records = ~15MB. With active use (50 sessions/week) observations take 10-20MB/month. RAM: Qdrant uses 50-100MB idle.

How do I uninstall it?

Clean removal in 4 steps: (1) docker stop/rm the Qdrant container, (2) rm -rf ~/.openexp/, (3) remove the openexp block from ~/.claude/settings.local.json, (4) delete the openexp folder. Nothing installs system-wide, zero leftover files.

How it Works

How is this different from CLAUDE.md?

CLAUDE.md is static context that you write and update by hand. OpenExp adds dynamic context: what you did yesterday, which approaches worked, which didn’t. They work together. The real advantage shows when you return to a project after a week or fixed a similar bug a month ago — the solution surfaces automatically.

What exactly gets remembered?

Everything you do through Claude Code: file edits, commands, decisions, emails. You can also explicitly say “remember that the client wants a 15% discount” — stored as a separate fact. It doesn’t record calls directly (it’s a text tool), but if you tell Claude to write down the summary after a call — that gets stored.

How does the system decide what’s important?

Q-learning. Every memory has a Q-value (from -0.5 to 1.0). If a memory was retrieved before a productive session (commit, closed deal) — its Q-value rises. If the session was empty — it drops. Over dozens of sessions, useful memories surface first, noise sinks.

Reward System

What reward signals does the system use?

Two types. Session rewards evaluate each working session automatically: commit = +0.3, PR = +0.2, deploy = +0.1, tests = +0.1, decisions = +0.1, files written = +0.02 each. Empty session = -0.1 base + -0.1 penalty. Separately, business outcome rewards fire through the CRM resolver: closed deal = +0.8, proposal sent = +0.25, payment received = +0.3. These are different reward paths — session rewards work automatically, business outcomes require CRM integration.

Different Workflows

Is this only for programmers?

No. There are ready profiles: sales with funnel stages (lead → contacted → qualified → proposal → negotiation → won) and dealflow (includes NDA, invoicing, payment). For a salesperson, a “productive session” means a sent email or a decision made, not a commit. Enable with one variable: OPENEXP_EXPERIENCE=sales. But honestly — these profiles are new and haven’t been battle-tested by many users yet. For other workflows you can create your own via openexp experience create.

Same memory, different value in different contexts?

Exactly. “Discussed NDA with client” in dealflow experience has Q-value 0.72 (led to payment), but in coding experience — 0.05 (no commits). This is called Experiences — different scoring profiles for different workflows.

What if I debug for 8 hours, find the root cause, but don’t commit?

Fair problem. By default such a session gets negative reward, and its memories are penalized. Partial solutions: create a separate Experience for research workflow with different signals, or manually calibrate via calibrate_experience_q. But by default the system is biased toward “visible productivity.”

Privacy & Reliability

Does my data go anywhere?

No. Qdrant runs in Docker on your machine, embeddings generated locally via FastEmbed. Zero cloud API for core operations. The only exception — optional LLM enrichment through Anthropic API (memory classification). Disable with OPENEXP_EXPLANATION_ENABLED=false, doesn’t affect core functionality.

If Docker crashes or computer shuts down — do I lose everything?

No. Qdrant persists data to disk. When Docker restarts — the container starts automatically (restart: unless-stopped). Q-cache is also on disk. The only thing you might lose is observations from the current unfinished session.

Integrations & Limitations

Does this work with Cursor, aider?

Currently Claude Code only. Integration is built on the hooks system (SessionStart, PostToolUse, SessionEnd) and MCP — these are Claude Code-specific APIs. Cursor and aider aren’t supported. The core engine is a generic Python library, theoretically you could write an adapter, but nobody has done that yet.

We’re on LangChain/LangGraph. How to integrate without Claude Code?

You can use the core Python library directly: search, QCache, add_memory(). But you’ll need to: (1) capture observations instead of the PostToolUse hook, (2) determine session end and its productivity, (3) integrate retrieval into your pipeline. REST API or LangChain package — not available yet.

I have 5+ projects. Won’t it get confused?

Full multi-project isolation doesn’t exist yet — one Qdrant collection for everything. Q-learning partially self-corrects: if a memory from a React project didn’t help in a Go session — its Q-value drops. Workaround: different OPENEXP_COLLECTION via .env for different projects.

Does it support multi-tenant for SaaS?

No. Currently a single-tenant library: one Qdrant, one Q-cache, one set of hooks. For SaaS with hundreds of users you’d need a custom HTTP layer with tenant routing. Not on the near-term roadmap.

Metrics & Evidence

Are there benchmarks? Retrieval quality graphs?

Honest answer — no benchmarks. None. We openly state this in CONTRIBUTING.md as an area where help is needed. “After 100 sessions” is a projection from Q-learning math (at alpha=0.25 you need ~4 positive updates to reach Q>0.5), not a result from a controlled experiment.

A/B test of “with Q-learning” vs “just vector search”?

No. The theoretical argument: similarity can’t distinguish current information from outdated, Q-value adds the signal “this has helped before.” But no ablation study has been conducted. At this stage Q-value reranking barely affects results because most memories have Q near 0. The potential is there, the proof is not.

You retrieve 10 memories, all get equal reward. But maybe only 1 actually helped?

Fundamental credit assignment problem, and we haven’t solved it. Partial mitigation: Experiences let you filter which memory types receive rewards (only “decision” and “insight,” not “action”). With enough sessions the noise averages out, but it’s slow.

Reward weights (commit=0.3, PR=0.2) — aren’t those just your personal patterns?

Fair point. Default weights are literally my workflow. A data scientist in Jupyter who never commits — every session gets negative reward. Experiences are an attempt to fix this: create a separate reward profile. But only 3 profiles ship (default, sales, dealflow), and none have been tested by other users.

Why OpenExp

Why not just Mem0? They have 51K stars and $24M funding.

Mem0 is a different weight class in infrastructure maturity. What OpenExp offers that Mem0 doesn’t: Q-learning ranking, outcome-based reward loop, process-aware memory. No competitor has learned prioritization. Realistic approach: use OpenExp’s Q-learning engine as a reranking step on top of your existing memory layer, rather than a full replacement.

How do I know it’s actually working?

After 2-3 weeks you’ll notice Claude starts “knowing” your context: conventions, past decisions, working approaches. There are also inspection tools: experience_insights shows the most valuable memory types, experience_top_memories shows top by Q-value, explain_q explains in plain language why a specific memory has its rating. But be realistic — the system needs time to accumulate data.

Skills tell your AI how. OpenExp teaches it what works.

The Learning Loop

Recall

Work

Evaluate

Reward

Skills Say "How." Nobody Says "What Works."

Skills don't learn

No outcome signal

Memory services store, not learn

How OpenExp Works

Automatic capture

Smart retrieval

Reward loop

Session Signals

Experiences — Your Process, Your Rewards

How OpenExp Compares

Five-Factor Retrieval

Fully Local. No SaaS.

Qdrant

FastEmbed

Q-Cache

Explainable

FAQ

Stop telling. Start teaching.