v1.8.1 · Desktop app for Mac, Windows & Linux

See where your prompt
breaks across models.

Run the same prompt across Claude, GPT, Gemini, and your local models at once. See every answer side by side, get one score for how much they agree, and catch the failures before your users do — privately, on your own hardware.

Download — free & self-hosted View on GitHub →

Local-first MIT licensed Works offline

prompt › Summarize this contract clause in one sentence.

Claude

answered · 1.2s

GPT

answered · 0.9s

Gemini

refused · added caveats

Divergence: 42% · Mixed

Same prompt. Three answers. One refused. You'd never see this with a single model.

Run Across Models — outputs side by side with a divergence verdict

The problem

The same prompt rarely behaves the same way twice.

Most teams never find out until production — when a customer does.

ON GPT

It succeeds

Clean, confident output. So you ship it — assuming every model behaves the same.

ON CLAUDE

It hallucinates

Fills the under-specified gaps with plausible-sounding details that aren't true.

ON GEMINI

It refuses

Reads an ambiguous instruction as unsafe and declines — or buries the answer in caveats.

ON LOCAL

It breaks

Ignores the format you asked for and returns something your parser can't read.

Amagra shows you all four before deployment — running your prompt everywhere at once and scoring exactly where the models disagree.

What it is

A debugger — for prompts.

You already know this loop from code: install, paste, run, see the errors, fix, run again. Amagra is that exact loop for the prompts you write — same muscle memory, no cloud.

The loop	VS Code + Python	Amagra
Get it	Installer · ~300 MB · ~5 min	`docker pull d4shm1r/amagra`
Runtime	Install the Python extension first	✓ Local model included — zero key, offline
New file	Open `test.py`, paste code	Open the debugger, paste your prompt
See problems	Hit Run → 10 errors print	Before you run — health score + role / task / format checks
Fix	Edit all 10 by hand	✓ One-click auto-repair, checks turn green live
Run	▶ → one output	Run across models — Claude / GPT / local, side by side
Read result	Output is output	Outputs + latency + a divergence verdict

VS Code tells you your code is broken. Amagra tells you your prompt is under-specified — and runs it across every model so you see exactly where they disagree.

How it works

From prompt to verdict in seconds.

Paste once, run everywhere, read one answer instead of three.

Paste your prompt

Drop in the prompt you're shipping. Instant checks flag what's under-specified before you spend a single token.

Pick your models

Claude, GPT, Gemini, and any local Ollama model. Cloud is opt-in with your own key; local runs fully offline.

Run across all of them

One click sends the same prompt everywhere. Answers come back side by side with latency and length.

Read the divergence

One verdict — Aligned, Mixed, or Divergent — tells you at a glance whether it's your prompt or the model.

        # Prefer the terminal? Run from source with Docker

        git clone https://github.com/d4shm1r/amagra

        cd agentic-ai && cp .env.example .env

        docker-compose up

        # → Dashboard: http://localhost:3000

Built in the open

Nothing hidden. Yours to run.

An open architecture with a documented API — see exactly how every decision was made.

Free

Self-hosted forever

Specialist agents

100+

Documented API endpoints

Model providers

MIT

Open source license

And there's more under the hood

A debugger on the surface. A workspace behind it.

Debugging prompts is the way in. Once you're inside, Amagra keeps working for you — routing, remembering, and recording every decision it makes.

Persistent memory

Semantic memory across sessions. It remembers your projects, decisions, and the lessons you've taught it.

Glass-box observability

Every routing decision is logged and replayable. Trace exactly why the system chose what it chose — not a black box.

Specialist agents

Ten domain experts — Python, web, DevOps, data, and more — dispatched automatically, running privately on your hardware.

Plan graph Decision replay Event log Memory browser Step verification Learns from feedback

Download

Install in a minute. No cloud account.

One download, double-click, done — the backend and UI ship together and run entirely on your machine.

macOS

.dmg · Apple Silicon & Intel

Download

Windows

.exe installer · 64-bit

Download

Linux

.AppImage · one file, no deps

Download

Unsigned for now, so first launch shows a Gatekeeper / SmartScreen prompt — open anyway. Prefer to run with Docker or build from source →

Pricing

Start free. Pay when it saves you time.

The self-hosted version is free forever. Managed plans add hosting and a frontier-model backend — no GPU required.

Self-Hosted

Run on your own hardware with your own models. Full access, no limits.

All specialist agents
Local model (4GB VRAM)
Unlimited queries
Full memory + learning
Plan graph, replay, event log
Managed hosting
Claude / GPT-4o backend

Get the code

See where your prompt
breaks across models.

The same prompt rarely behaves the same way twice.

A debugger — for prompts.

From prompt to verdict in seconds.

Nothing hidden. Yours to run.

A debugger on the surface. A workspace behind it.

Install in a minute. No cloud account.

Start free. Pay when it saves you time.

Questions before you download?

Your AI. Your hardware. Your data.

See where your promptbreaks across models.

The same prompt rarely behaves the same way twice.

A debugger — for prompts.

From prompt to verdict in seconds.

Nothing hidden. Yours to run.

A debugger on the surface. A workspace behind it.

Install in a minute. No cloud account.

Start free. Pay when it saves you time.

Questions before you download?

Your AI. Your hardware. Your data.

See where your prompt
breaks across models.