v1.1.2 · Now available

The AI you can trust
with long-term work.

A self-hosted, local-first AI assistant with persistent memory. It remembers your projects across sessions and shows its work — every routing decision is logged and replayable — all on your own hardware.

Memory
Remembers across sessions
Provenance
Every decision can be replayed
Private
Nothing leaves your machine

Built for people who need control.

Not another chatbot. A private AI platform for developers and engineers who want expert help without cloud dependency.

Solo Developers
  • Debug code and build features faster
  • Keep project memory across sessions
  • Run locally with no subscription
Homelab & IT Engineers
  • Network troubleshooting and DNS
  • Linux administration and scripting
  • Infrastructure planning and review
AI Builders
  • Local-first agentic platform
  • Persistent memory and planning
  • Full observability into AI decisions
Privacy-Conscious Teams
  • No cloud lock-in or data exposure
  • Full audit trail of every decision
  • Self-hosted on your own infrastructure

See how the system thinks.

Every decision is logged, visualized, and replayable. Not a black box — a glass box.

UCI Dashboard
UNIFIED COGNITIVE INDEX · h_UCI 80.8 ↑ +5.7
Reliability · Intelligence · Adaptation · Productivity
Intelligence Metrics
Plan Graph
EXECUTION DAG
parse
route
reason
verify
respond
Plan Graph
Memory Browser
592 MEMORIES · FAISS BACKEND
python_dev
ai_ml
it_network
knowledge
Memory Browser
Decision Replay
THEN vs NOW
THEN
NOW
agent: same · confidence: improved · reflect: reduced
Decision Replay

Not a chatbot. A team.

Generic AI gives generic answers. Specialist agents give expert answers — every time, locally, privately.

Complete Privacy
Runs entirely on your hardware. Conversations, code, and context never leave your machine. No telemetry, no external logging, no training on your data.
Deterministic Routing
A geometric scoring engine classifies intent in milliseconds — not an LLM call. Result: 98% routing accuracy at sub-second latency, every time.
Persistent Memory
Semantic memory across sessions using a FAISS vector backend with 52× LRU cache speedup. The system remembers your projects, lessons, and context.
Full Observability
Every routing decision is logged with confidence scores, domain signals, and regret metrics. You can audit exactly why the system chose what it chose.
Consumer Hardware
Runs on an RTX 2050 with 4GB VRAM. No A100, no cloud GPU, no inference bill. phi4-mini via Ollama delivers specialist-quality answers on a gaming laptop.
Self-Learning
Thumbs-up/down feedback adjusts agent weights through a learning kernel. The system adapts to your usage patterns over time and gets better the more you use it.

More than chat.

Most tools stop at the response. Amagra lets you see inside the process — replay decisions, inspect memory, trace every step.

Plan Graph
Visualize execution as a dependency graph. See which steps ran in parallel, which blocked, and why.
Decision Replay
Re-run any past query and compare how the system would decide now vs then. Measure improvement over time.
Event Log
A live stream of every cognitive event — routing, reflection, memory access, contradiction detection.
Memory Browser
Browse, search, prune, and consolidate semantic memories. See quality scores and usage frequency for every entry.
Risk Observatory
Track reflection triggers, critic gate acceptance rates, and routing conflict rates in real time.
Step Verification
Each goal step is scored before delivery. Low-confidence responses trigger automatic retry with a higher reflection level.

Each expert knows its domain.

The routing engine reads every query and dispatches to the right specialist in under one second — automatically.

Python Dev
Python, FastAPI, asyncio, pandas. Complete, runnable scripts — not code fragments. Handles edge cases and imports.
.NET Dev
C#, ASP.NET Core, Blazor, Entity Framework. Full working code for the .NET ecosystem including DI, async, and xUnit tests.
Web Dev
React, TypeScript, Next.js, Node.js, Tailwind. Full-stack web from hooks to APIs. Handles build tooling and async state.
IT & Networking
Wi-Fi, DNS, SSH, VPN, Linux networking. Runs real diagnostic commands and gives specific, actionable fixes.
AI & ML
PyTorch, TensorFlow, LLMs, embeddings, fine-tuning. Explains concepts clearly and implements training pipelines.
DevOps
Docker, Kubernetes, GitHub Actions, Bash, systemd. Dockerfiles, Compose configs, CI/CD pipelines, and deployment scripts.
Data Analyst
pandas, SQL, NumPy, matplotlib. Data cleaning, EDA, and statistical summaries with working visualization code.
Writer
READMEs, API docs, blog posts, commit messages. Technical writing that developers actually want to read. Edits and proofreads.
Knowledge
Explains any concept clearly. Saves lessons to memory so you never lose what you learn. Corrects wrong analogies directly.
Terse
One-line answers. The command, the syntax, the flag. No explanation unless you ask. Fastest path to the answer you need.

From query to expert answer in seconds.

The routing pipeline is deterministic and auditable at every step.

01
Signal Extraction
Your query is scored against domain-specific keyword registries. Each token votes for one or more specialist domains — no LLM required.
02
Confidence Routing
If one domain scores above threshold with no competitor, it routes deterministically. Ambiguous queries use phi4-mini as tiebreaker.
03
Specialist Response
The specialist receives your query with domain context and relevant memory injected. Response time: 2–8 seconds depending on complexity.
04
Memory + Learning
The response is stored as a semantic memory. Your feedback adjusts agent weights through the learning kernel — improving future answers.

First 5 minutes

1
Run Docker
docker-compose up
2
Open Dashboard
localhost:3000
3
Ask a Question
Any domain
4
Watch Routing
See the decision
5
Explore Memory
Plans & replay
# Install & run in 3 commands
git clone https://github.com/d4shm1r/amagra
cd agentic-ai && cp .env.example .env
docker-compose up

# → Dashboard: http://localhost:3000
# → API: http://localhost:8000

Built different.

Cloud AI tools are powerful but opaque, dependent, and expensive at scale. Amagra is the alternative for engineers who want control.

Feature AMAGRA ChatGPT Claude Copilot
Runs locally (no cloud)
Persistent memory Limited Limited
Decision replay
Plan graph (DAG)
Event log & observability
Self-hosted
Routing audit trail
Learns from feedback Via RLHF Via RLHF
No subscription required ✓ Self-hosted

Every phase, in public.

An open architecture and a full, documented API. Nothing is hidden — you can see exactly how it works.

37
Development phases, in the open
10
Specialists on call
100+
Documented API endpoints
<1ms
To recall what it knows
100%
On your hardware
MIT
Open source — yours to run

Start free. Pay when it saves you time.

The self-hosted version is free forever. Pro adds managed hosting with a frontier model backend.

Self-Hosted
$0
Run on your own hardware with your own models. Full access, no subscription required, no limits.
  • All specialist agents
  • Local phi4-mini (4GB VRAM)
  • Unlimited queries
  • Full memory + learning system
  • Mission Control dashboard
  • Plan graph, replay, event log
  • Managed hosting
  • Claude / GPT-4o backend
Get the Code
Team
$249/month
2–10 users with shared cognitive state. One engineer's context improves the whole team.
  • Everything in Developer
  • Up to 10 users
  • Shared world model
  • 50,000 queries / day
  • Org-level UCI dashboard
  • Team memory namespace
  • Cognitive Ops audit trail
Contact Us
Free
Self-Hosted Forever
21
Skill Nodes
60+
API Endpoints
52×
Cache Speedup
98%
Routing Accuracy

FAQ

Do I need a GPU?
A 4GB VRAM GPU is recommended for phi4-mini. The system also runs on CPU with reduced throughput. Any modern NVIDIA RTX card works out of the box.
What models does it support?
Any model served by Ollama — phi4-mini by default. Swap to Llama 3, Mistral, Gemma, or any GGUF-compatible model by changing a single config line.
Does data ever leave my machine?
No. The self-hosted version runs entirely locally — inference, memory, and routing all happen on your hardware. Zero telemetry.
How is routing handled?
A signal-first QuerySignal pipeline classifies each query using phrase-weighted keyword scoring across a 21-node skill graph, hitting 98% accuracy without any external API calls.
What is h_UCI?
The Hierarchical Unified Cognitive Index — a composite score across Reliability, Intelligence, Adaptation, and Productivity. Current value: 80.8/100. It updates live as the system processes queries.
Is there a managed cloud option?
Yes — the Developer and Team plans (see Pricing) provide managed hosting with UCI dashboard persistence, no GPU required. Contact us to get on the early access list.

Your AI. Your hardware. Your data.

The self-hosted version is free forever. Runs on hardware you already own. Get a free API key in seconds — no card required.