One-command demo so the gateway can be exercised end-to-end without a GPU or a real model download: - demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags, /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count and eval_count on the final frame, /api/embed, /api/show, /api/version). Non-root multi-stage Dockerfile, never published (internal network only). - docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground. Mirrors the prod posture (mock-ollama not exposed). - demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with allow_all_models and a fresh API key via the bootstrap CLI inside the container, then prints the key, the playground URL, and five ready-to-paste curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull). ./demo.sh --down tears everything back down with volumes. - playground/index.html — single-file dark-themed UI served same-origin by the gateway at /playground (CORS-free). Per-endpoint About card with method/ auth/streaming badges, a real description, sample request body, sample response, and a footer note. Live SSE/NDJSON rendering of the response. A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh are visibly gated until an API key is in the field; the Base URL is force-pinned to location.origin three times to defeat browser autofill. - docs/ — API.md (full endpoint reference with curl, streaming formats, error model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule, pointing at a real Ollama backend, env reference), THREAT_MODEL.md (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md. mkdocs.yml (Material theme) wires them together.
4.4 KiB
neuronetz-gateway — Demo & Playground
The fastest way to see the gateway working end-to-end, with no GPU and no model downloads.
./demo.sh brings up the gateway against a mock Ollama backend, mints a demo API key, and
prints ready-to-paste curl commands and a link to an interactive browser playground.
Launch the demo
From the repo root:
./demo.sh
This will:
- Build and start the demo stack (
docker-compose.demo.yml): postgres + redis + mock-ollama + gateway. No Caddy; the gateway is published on127.0.0.1:8080. - Wait for the gateway to report healthy at
/healthz. - Create a demo tenant (
--allow-all-models) and an API key via the bootstrap CLI inside the gateway container, capturing the key (which is printed exactly once). - Print a summary: the API key, the playground URL
http://localhost:8080/playground, and five ready-to-paste curl commands —- streaming
/v1/chat/completions(OpenAI SSE), - streaming
/api/chat(native NDJSON), GET /v1/models,- a 401 example (no/bad key),
- a 403 example (
POST /api/pull, hard-blocked).
- streaming
The script is re-runnable: an existing tenant is reused, and each run mints a fresh, uniquely-named key (the full key only ever prints at creation).
Tear everything down (containers + volumes):
./demo.sh --down
What's running
| Service | Exposed? | Notes |
|---|---|---|
gateway |
127.0.0.1:8080 |
The real gateway image, built from the repo Dockerfile. |
mock-ollama |
no | Internal network only — mirrors the prod "Ollama is never exposed" rule. |
postgres |
no | Internal only. |
redis |
no | Internal only. |
The mock backend (demo/mock-ollama/) emulates Ollama's API shapes — including realistic
prompt_eval_count / eval_count on the final stream object — so token counting, model
discovery, and /api/show sanitization all exercise real gateway code paths. It serves a
small catalogue: llama3.1:8b, mistral:7b, qwen2.5:3b, nomic-embed-text.
Use the playground
Open http://localhost:8080/playground in a browser. It is a single self-contained HTML page, served same-origin by the gateway (so no CORS to worry about).
- Base URL is pre-filled with the current origin; leave it as is for the demo.
- Paste the API key from the
./demo.shoutput into the Bearer field. (Typing a key auto-loads the model dropdown; you can also hit ↻ Refresh.) - Pick an endpoint tab:
/v1/chat/completions,/api/chat,/api/generate,/v1/models,/api/tags,/healthz,/readyz. - Choose a model from the auto-populated dropdown, type a prompt, toggle stream.
- Hit ▶ Run. The streamed output renders live — SSE
data:deltas (incl.[DONE]) for/v1/*, NDJSON lines for/api/*. - The panel shows the response status and the rate-limit / budget response headers
(
X-Request-ID,X-RateLimit-*,X-Budget-*; SPEC §6.5). - The Exact curl box mirrors precisely what Run sends — copy it to reproduce in a terminal.
Try the 403 path too: there's no mutating-endpoint tab by design, but the printed curl for
POST /api/pull shows the hard block, and an invalid key in the Bearer field demonstrates the
401 fail-closed response.
⚠️ Security note: the playground is OFF by default in production
The playground route is flag-gated and disabled by default. The demo stack turns it on explicitly:
# docker-compose.demo.yml (gateway service)
GATEWAY_PLAYGROUND_ENABLED: "true"
GATEWAY_PLAYGROUND_FILE: /app/playground/index.html
with the file mounted read-only into the container:
volumes:
- ./playground:/app/playground:ro
The production stack (docker-compose.yml) does not set GATEWAY_PLAYGROUND_ENABLED, so
the route is absent. Do not enable it on a public deployment: it is a convenience for demos and
local development, not a production surface. Leaving it off keeps the public attack surface to
the documented API only.
Files behind the demo
| Path | What it is |
|---|---|
demo.sh |
The one-command entrypoint (up / --down). |
docker-compose.demo.yml |
The demo stack definition. |
demo/mock-ollama/ |
The standalone mock Ollama service (FastAPI app + Dockerfile). |
playground/index.html |
The self-contained browser playground served at /playground. |