One-command demo so the gateway can be exercised end-to-end without a GPU or a real model download: - demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags, /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count and eval_count on the final frame, /api/embed, /api/show, /api/version). Non-root multi-stage Dockerfile, never published (internal network only). - docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground. Mirrors the prod posture (mock-ollama not exposed). - demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with allow_all_models and a fresh API key via the bootstrap CLI inside the container, then prints the key, the playground URL, and five ready-to-paste curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull). ./demo.sh --down tears everything back down with volumes. - playground/index.html — single-file dark-themed UI served same-origin by the gateway at /playground (CORS-free). Per-endpoint About card with method/ auth/streaming badges, a real description, sample request body, sample response, and a footer note. Live SSE/NDJSON rendering of the response. A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh are visibly gated until an API key is in the field; the Base URL is force-pinned to location.origin three times to defeat browser autofill. - docs/ — API.md (full endpoint reference with curl, streaming formats, error model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule, pointing at a real Ollama backend, env reference), THREAT_MODEL.md (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md. mkdocs.yml (Material theme) wires them together.
114 lines
4.4 KiB
Markdown
114 lines
4.4 KiB
Markdown
# neuronetz-gateway — Demo & Playground
|
|
|
|
The fastest way to see the gateway working end-to-end, with **no GPU and no model downloads**.
|
|
`./demo.sh` brings up the gateway against a mock Ollama backend, mints a demo API key, and
|
|
prints ready-to-paste curl commands and a link to an interactive browser playground.
|
|
|
|
---
|
|
|
|
## Launch the demo
|
|
|
|
From the repo root:
|
|
|
|
```bash
|
|
./demo.sh
|
|
```
|
|
|
|
This will:
|
|
|
|
1. Build and start the demo stack (`docker-compose.demo.yml`): **postgres + redis +
|
|
mock-ollama + gateway**. No Caddy; the gateway is published on `127.0.0.1:8080`.
|
|
2. Wait for the gateway to report healthy at `/healthz`.
|
|
3. Create a demo tenant (`--allow-all-models`) and an API key via the bootstrap CLI **inside
|
|
the gateway container**, capturing the key (which is printed exactly once).
|
|
4. Print a summary: the **API key**, the **playground URL**
|
|
`http://localhost:8080/playground`, and five ready-to-paste curl commands —
|
|
- streaming `/v1/chat/completions` (OpenAI SSE),
|
|
- streaming `/api/chat` (native NDJSON),
|
|
- `GET /v1/models`,
|
|
- a **401** example (no/bad key),
|
|
- a **403** example (`POST /api/pull`, hard-blocked).
|
|
|
|
The script is **re-runnable**: an existing tenant is reused, and each run mints a fresh,
|
|
uniquely-named key (the full key only ever prints at creation).
|
|
|
|
Tear everything down (containers + volumes):
|
|
|
|
```bash
|
|
./demo.sh --down
|
|
```
|
|
|
|
### What's running
|
|
|
|
| Service | Exposed? | Notes |
|
|
|---|---|---|
|
|
| `gateway` | `127.0.0.1:8080` | The real gateway image, built from the repo `Dockerfile`. |
|
|
| `mock-ollama` | **no** | Internal network only — mirrors the prod "Ollama is never exposed" rule. |
|
|
| `postgres` | **no** | Internal only. |
|
|
| `redis` | **no** | Internal only. |
|
|
|
|
The mock backend (`demo/mock-ollama/`) emulates Ollama's API shapes — including realistic
|
|
`prompt_eval_count` / `eval_count` on the final stream object — so token counting, model
|
|
discovery, and `/api/show` sanitization all exercise real gateway code paths. It serves a
|
|
small catalogue: `llama3.1:8b`, `mistral:7b`, `qwen2.5:3b`, `nomic-embed-text`.
|
|
|
|
---
|
|
|
|
## Use the playground
|
|
|
|
Open **http://localhost:8080/playground** in a browser. It is a single self-contained HTML
|
|
page, served **same-origin** by the gateway (so no CORS to worry about).
|
|
|
|
1. **Base URL** is pre-filled with the current origin; leave it as is for the demo.
|
|
2. Paste the **API key** from the `./demo.sh` output into the Bearer field. (Typing a key
|
|
auto-loads the model dropdown; you can also hit **↻ Refresh**.)
|
|
3. Pick an **endpoint** tab: `/v1/chat/completions`, `/api/chat`, `/api/generate`,
|
|
`/v1/models`, `/api/tags`, `/healthz`, `/readyz`.
|
|
4. Choose a **model** from the auto-populated dropdown, type a prompt, toggle **stream**.
|
|
5. Hit **▶ Run**. The streamed output renders **live** — SSE `data:` deltas (incl. `[DONE]`)
|
|
for `/v1/*`, NDJSON lines for `/api/*`.
|
|
6. The panel shows the **response status** and the rate-limit / budget **response headers**
|
|
(`X-Request-ID`, `X-RateLimit-*`, `X-Budget-*`; SPEC §6.5).
|
|
7. The **Exact curl** box mirrors precisely what **Run** sends — copy it to reproduce in a
|
|
terminal.
|
|
|
|
Try the 403 path too: there's no mutating-endpoint tab by design, but the printed `curl` for
|
|
`POST /api/pull` shows the hard block, and an invalid key in the Bearer field demonstrates the
|
|
401 fail-closed response.
|
|
|
|
---
|
|
|
|
## ⚠️ Security note: the playground is OFF by default in production
|
|
|
|
The playground route is **flag-gated** and **disabled by default**. The demo stack turns it on
|
|
explicitly:
|
|
|
|
```yaml
|
|
# docker-compose.demo.yml (gateway service)
|
|
GATEWAY_PLAYGROUND_ENABLED: "true"
|
|
GATEWAY_PLAYGROUND_FILE: /app/playground/index.html
|
|
```
|
|
|
|
with the file mounted read-only into the container:
|
|
|
|
```yaml
|
|
volumes:
|
|
- ./playground:/app/playground:ro
|
|
```
|
|
|
|
The production stack (`docker-compose.yml`) does **not** set `GATEWAY_PLAYGROUND_ENABLED`, so
|
|
the route is absent. Do not enable it on a public deployment: it is a convenience for demos and
|
|
local development, not a production surface. Leaving it off keeps the public attack surface to
|
|
the documented API only.
|
|
|
|
---
|
|
|
|
## Files behind the demo
|
|
|
|
| Path | What it is |
|
|
|---|---|
|
|
| `demo.sh` | The one-command entrypoint (up / `--down`). |
|
|
| `docker-compose.demo.yml` | The demo stack definition. |
|
|
| `demo/mock-ollama/` | The standalone mock Ollama service (FastAPI app + Dockerfile). |
|
|
| `playground/index.html` | The self-contained browser playground served at `/playground`. |
|