Files

Stephan Berbig b47a09db91 demo + playground + docs

One-command demo so the gateway can be exercised end-to-end without a GPU or a
real model download:

- demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags,
  /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count
  and eval_count on the final frame, /api/embed, /api/show, /api/version).
  Non-root multi-stage Dockerfile, never published (internal network only).
- docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with
  PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground.
  Mirrors the prod posture (mock-ollama not exposed).
- demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with
  allow_all_models and a fresh API key via the bootstrap CLI inside the
  container, then prints the key, the playground URL, and five ready-to-paste
  curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull).
  ./demo.sh --down tears everything back down with volumes.
- playground/index.html — single-file dark-themed UI served same-origin by
  the gateway at /playground (CORS-free). Per-endpoint About card with method/
  auth/streaming badges, a real description, sample request body, sample
  response, and a footer note. Live SSE/NDJSON rendering of the response.
  A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh
  are visibly gated until an API key is in the field; the Base URL is
  force-pinned to location.origin three times to defeat browser autofill.
- docs/ — API.md (full endpoint reference with curl, streaming formats, error
  model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery
  + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule,
  pointing at a real Ollama backend, env reference), THREAT_MODEL.md
  (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md
  (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md.
  mkdocs.yml (Material theme) wires them together.

2026-05-26 20:52:33 +02:00

7.1 KiB

Raw Blame History

neuronetz-gateway — Deployment

Production deployment is a single Docker Compose stack: **Caddy + gateway + Postgres + Redis

Ollama**. Caddy is the only public-facing component; it terminates TLS via Let's Encrypt for api.neuronetz.ai and reverse-proxies to the internal-only gateway.

For the local, no-GPU demo (mock Ollama + playground), see PLAYGROUND.md and run ./demo.sh. This document is the production path.

The one rule that must never break

⛔ Ollama is NEVER exposed to the host or the internet.

The ollama service in docker-compose.yml has no ports: mapping and must never get one. Ollama is reachable only on the internal Docker network as ollama:11434. Publishing it would re-open the exact unauthenticated exposure this whole project exists to close (SPEC §1, §3; AGENT_PROMPT non-negotiable #2).

The same posture applies to Postgres and Redis in the production compose file — no published ports. Only Caddy binds host ports (80/443, 443/udp for HTTP/3).

Prerequisites

A host with Docker + Docker Compose.
DNS: api.neuronetz.ai → the host's public IP (for Let's Encrypt).
Ports 80 and 443 reachable from the internet (ACME HTTP/TLS challenge + serving).

Steps

git clone <repo> neuronetz-gateway && cd neuronetz-gateway

# 1. Configure. Copy the example env and change EVERY secret.
cp .env.example .env
#   - POSTGRES_PASSWORD: a strong, unique value
#   - DATABASE_URL: must match the POSTGRES_* values
#   - GATEWAY_LOG_FORMAT=json for production

# 2. Configure Caddy for your domain + ACME email.
cp ops/caddy/Caddyfile.example ops/caddy/Caddyfile   # then edit the site + email
#   (docker-compose.yml mounts Caddyfile.example by default; point it at your edited file
#    or edit in place.)

# 3. Bring up the full stack. The gateway runs `alembic upgrade head`, then serves.
docker compose up -d --build

# 4. Bootstrap a tenant + key (CLI runs inside the gateway container).
docker compose exec gateway neuronetz-gateway create-tenant --name acme --rpm 120 --tpm 200000
docker compose exec gateway neuronetz-gateway create-key --tenant acme --name prod-server-1
#   ^ prints the full key ONCE — store it in your secret manager now.

# 5. Smoke test (through Caddy / TLS).
curl https://api.neuronetz.ai/healthz
curl -N https://api.neuronetz.ai/v1/chat/completions \
  -H "Authorization: Bearer nz_…" -H "Content-Type: application/json" \
  -d '{"model":"llama3.1:8b","stream":true,"messages":[{"role":"user","content":"hi"}]}'

Caddy obtains and renews the certificate automatically. For local testing without a public domain, use the localhost { tls internal … } block documented in Caddyfile.example (trust Caddy's local CA or pass -k to curl).

Pointing at a real Ollama backend

The gateway reaches Ollama via OLLAMA_BASE_URL. In the bundled stack this is the in-stack ollama service: OLLAMA_BASE_URL=http://ollama:11434.

To use an existing/external Ollama host instead:

Remove the ollama service from docker-compose.yml (or leave it; it just won't be used).
Set OLLAMA_BASE_URL to the backend address reachable from the gateway container, e.g. http://10.0.0.5:11434 or an internal DNS name.
Ensure that backend is itself not exposed to the internet — the gateway is the only thing that should ever reach it. Use a private network / firewall rule, not a public port.
Pull the models you want available on that backend. They appear in tenants' effective sets automatically on the next discovery refresh (SPEC §4.6) — no gateway config change for allow_all_models tenants.

Discovery polls OLLAMA_BASE_URL/api/tags every MODEL_DISCOVERY_REFRESH_S seconds. If the backend is unreachable, the discovered set is empty and requests fail closed.

Environment reference (SPEC §7)

All configuration is via environment variables, validated by Pydantic Settings on boot. Boot fails loudly on invalid config. See .env.example for a copyable file.

Service

Var	Default	Notes
`GATEWAY_BIND_HOST`	`0.0.0.0`	Bind-all inside the container.
`GATEWAY_BIND_PORT`	`8080`	Internal port; never published directly in prod.
`GATEWAY_LOG_LEVEL`	`INFO`
`GATEWAY_LOG_FORMAT`	`json`	`json` in prod, `console` for local dev.
`GATEWAY_REQUEST_ID_HEADER`	`X-Request-ID`
`GATEWAY_TRUSTED_PROXIES`	`127.0.0.1,caddy`	Sources trusted for `X-Forwarded-For`.

Upstream (Ollama)

Var	Default	Notes
`OLLAMA_BASE_URL`	`http://ollama:11434`	Internal address of the backend.
`OLLAMA_CONNECT_TIMEOUT_S`	`5`
`OLLAMA_READ_TIMEOUT_S`	`600`	Long, for slow generations.
`OLLAMA_MAX_CONNECTIONS`	`64`	httpx pool size.

Model discovery (§4.6)

Var	Default	Notes
`MODEL_DISCOVERY_REFRESH_S`	`60`	How often to re-query `/api/tags`.
`MODEL_DISCOVERY_CACHE_TTL_S`	`120`	Redis TTL for the discovered set.

Database

Var	Default	Notes
`DATABASE_URL`	`postgresql+asyncpg://…`	asyncpg driver.
`DATABASE_POOL_SIZE`	`10`
`DATABASE_POOL_OVERFLOW`	`20`

Redis

Var	Default	Notes
`REDIS_URL`	`redis://redis:6379/0`
`REDIS_KEY_CACHE_TTL_S`	`60`	Resolved-key cache TTL.

Limits (defaults; per-tenant/key DB overrides win)

Var	Default	Notes
`DEFAULT_RPM`	`60`
`DEFAULT_TPM`	`100000`
`DEFAULT_CONCURRENT`	`8`
`MAX_REQUEST_BODY_BYTES`	`262144`	256 KiB request cap.
`MAX_NUM_PREDICT`	`4096`	Hard cap on requested completion tokens.

Security

Var	Default	Notes
`ARGON2_TIME_COST`	`3`
`ARGON2_MEMORY_COST_KIB`	`65536`	64 MiB.
`ARGON2_PARALLELISM`	`4`
`AUTH_FAILURE_RATE_LIMIT_PER_IP_PER_MIN`	`20`	Throttles auth brute-force per source IP.

Audit

Var	Default	Notes
`AUDIT_BUFFER_SIZE`	`1000`	Ring buffer; full ⇒ deny mode.
`PROMPT_LOG_DEFAULT_RETENTION_DAYS`	`30`
`AUDIT_LOG_DEFAULT_RETENTION_DAYS`	`365`

TLS & security headers (Caddy)

ops/caddy/Caddyfile.example already sets:

HSTS max-age=63072000; includeSubDomains; preload
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Referrer-Policy: no-referrer
strips Server and X-Powered-By

Edit the site address and ACME email before deploying.

Non-Compose (systemd)

A systemd unit is provided for hosts that run the image directly (ops/systemd/). The gateway still requires reachable Postgres, Redis, and Ollama, and the same environment variables. TLS in that topology is whatever fronts the host (Caddy, nginx, a load balancer) — Ollama still must not be publicly reachable.

Upgrades & migrations

The gateway runs alembic upgrade head on container start, so a normal docker compose up -d --build after pulling a new version applies pending migrations. For zero-downtime upgrades, run migrations as a one-off (docker compose run --rm gateway alembic upgrade head) before rolling the service.

7.1 KiB Raw Blame History