Files

Stephan Berbig b47a09db91 demo + playground + docs

One-command demo so the gateway can be exercised end-to-end without a GPU or a
real model download:

- demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags,
  /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count
  and eval_count on the final frame, /api/embed, /api/show, /api/version).
  Non-root multi-stage Dockerfile, never published (internal network only).
- docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with
  PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground.
  Mirrors the prod posture (mock-ollama not exposed).
- demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with
  allow_all_models and a fresh API key via the bootstrap CLI inside the
  container, then prints the key, the playground URL, and five ready-to-paste
  curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull).
  ./demo.sh --down tears everything back down with volumes.
- playground/index.html — single-file dark-themed UI served same-origin by
  the gateway at /playground (CORS-free). Per-endpoint About card with method/
  auth/streaming badges, a real description, sample request body, sample
  response, and a footer note. Live SSE/NDJSON rendering of the response.
  A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh
  are visibly gated until an API key is in the field; the Base URL is
  force-pinned to location.origin three times to defeat browser autofill.
- docs/ — API.md (full endpoint reference with curl, streaming formats, error
  model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery
  + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule,
  pointing at a real Ollama backend, env reference), THREAT_MODEL.md
  (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md
  (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md.
  mkdocs.yml (Material theme) wires them together.

2026-05-26 20:52:33 +02:00

4.4 KiB

Raw Permalink Blame History

neuronetz-gateway — Demo & Playground

The fastest way to see the gateway working end-to-end, with no GPU and no model downloads. ./demo.sh brings up the gateway against a mock Ollama backend, mints a demo API key, and prints ready-to-paste curl commands and a link to an interactive browser playground.

Launch the demo

From the repo root:

./demo.sh

This will:

Build and start the demo stack (docker-compose.demo.yml): postgres + redis + mock-ollama + gateway. No Caddy; the gateway is published on 127.0.0.1:8080.
Wait for the gateway to report healthy at /healthz.
Create a demo tenant (--allow-all-models) and an API key via the bootstrap CLI inside the gateway container, capturing the key (which is printed exactly once).
Print a summary: the API key, the playground URL http://localhost:8080/playground, and five ready-to-paste curl commands —
- streaming /v1/chat/completions (OpenAI SSE),
- streaming /api/chat (native NDJSON),
- GET /v1/models,
- a 401 example (no/bad key),
- a 403 example (POST /api/pull, hard-blocked).

The script is re-runnable: an existing tenant is reused, and each run mints a fresh, uniquely-named key (the full key only ever prints at creation).

Tear everything down (containers + volumes):

./demo.sh --down

What's running

Service	Exposed?	Notes
`gateway`	`127.0.0.1:8080`	The real gateway image, built from the repo `Dockerfile`.
`mock-ollama`	no	Internal network only — mirrors the prod "Ollama is never exposed" rule.
`postgres`	no	Internal only.
`redis`	no	Internal only.

The mock backend (demo/mock-ollama/) emulates Ollama's API shapes — including realistic prompt_eval_count / eval_count on the final stream object — so token counting, model discovery, and /api/show sanitization all exercise real gateway code paths. It serves a small catalogue: llama3.1:8b, mistral:7b, qwen2.5:3b, nomic-embed-text.

Use the playground

Open http://localhost:8080/playground in a browser. It is a single self-contained HTML page, served same-origin by the gateway (so no CORS to worry about).

Base URL is pre-filled with the current origin; leave it as is for the demo.
Paste the API key from the ./demo.sh output into the Bearer field. (Typing a key auto-loads the model dropdown; you can also hit ↻ Refresh.)
Pick an endpoint tab: /v1/chat/completions, /api/chat, /api/generate, /v1/models, /api/tags, /healthz, /readyz.
Choose a model from the auto-populated dropdown, type a prompt, toggle stream.
Hit ▶ Run. The streamed output renders live — SSE data: deltas (incl. [DONE]) for /v1/*, NDJSON lines for /api/*.
The panel shows the response status and the rate-limit / budget response headers (X-Request-ID, X-RateLimit-*, X-Budget-*; SPEC §6.5).
The Exact curl box mirrors precisely what Run sends — copy it to reproduce in a terminal.

Try the 403 path too: there's no mutating-endpoint tab by design, but the printed curl for POST /api/pull shows the hard block, and an invalid key in the Bearer field demonstrates the 401 fail-closed response.

⚠️ Security note: the playground is OFF by default in production

The playground route is flag-gated and disabled by default. The demo stack turns it on explicitly:

# docker-compose.demo.yml (gateway service)
GATEWAY_PLAYGROUND_ENABLED: "true"
GATEWAY_PLAYGROUND_FILE: /app/playground/index.html

with the file mounted read-only into the container:

volumes:
  - ./playground:/app/playground:ro

The production stack (docker-compose.yml) does not set GATEWAY_PLAYGROUND_ENABLED, so the route is absent. Do not enable it on a public deployment: it is a convenience for demos and local development, not a production surface. Leaving it off keeps the public attack surface to the documented API only.

Files behind the demo

Path	What it is
`demo.sh`	The one-command entrypoint (up / `--down`).
`docker-compose.demo.yml`	The demo stack definition.
`demo/mock-ollama/`	The standalone mock Ollama service (FastAPI app + Dockerfile).
`playground/index.html`	The self-contained browser playground served at `/playground`.

4.4 KiB Raw Permalink Blame History