One-command demo so the gateway can be exercised end-to-end without a GPU or a
real model download:
- demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags,
/api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count
and eval_count on the final frame, /api/embed, /api/show, /api/version).
Non-root multi-stage Dockerfile, never published (internal network only).
- docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with
PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground.
Mirrors the prod posture (mock-ollama not exposed).
- demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with
allow_all_models and a fresh API key via the bootstrap CLI inside the
container, then prints the key, the playground URL, and five ready-to-paste
curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull).
./demo.sh --down tears everything back down with volumes.
- playground/index.html — single-file dark-themed UI served same-origin by
the gateway at /playground (CORS-free). Per-endpoint About card with method/
auth/streaming badges, a real description, sample request body, sample
response, and a footer note. Live SSE/NDJSON rendering of the response.
A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh
are visibly gated until an API key is in the field; the Base URL is
force-pinned to location.origin three times to defeat browser autofill.
- docs/ — API.md (full endpoint reference with curl, streaming formats, error
model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery
+ the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule,
pointing at a real Ollama backend, env reference), THREAT_MODEL.md
(SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md
(key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md.
mkdocs.yml (Material theme) wires them together.