Stephan Berbig b47a09db91 demo + playground + docs
One-command demo so the gateway can be exercised end-to-end without a GPU or a
real model download:

- demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags,
  /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count
  and eval_count on the final frame, /api/embed, /api/show, /api/version).
  Non-root multi-stage Dockerfile, never published (internal network only).
- docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with
  PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground.
  Mirrors the prod posture (mock-ollama not exposed).
- demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with
  allow_all_models and a fresh API key via the bootstrap CLI inside the
  container, then prints the key, the playground URL, and five ready-to-paste
  curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull).
  ./demo.sh --down tears everything back down with volumes.
- playground/index.html — single-file dark-themed UI served same-origin by
  the gateway at /playground (CORS-free). Per-endpoint About card with method/
  auth/streaming badges, a real description, sample request body, sample
  response, and a footer note. Live SSE/NDJSON rendering of the response.
  A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh
  are visibly gated until an API key is in the field; the Base URL is
  force-pinned to location.origin three times to defeat browser autofill.
- docs/ — API.md (full endpoint reference with curl, streaming formats, error
  model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery
  + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule,
  pointing at a real Ollama backend, env reference), THREAT_MODEL.md
  (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md
  (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md.
  mkdocs.yml (Material theme) wires them together.
2026-05-26 20:52:33 +02:00
2026-05-26 20:52:33 +02:00
2026-05-26 20:52:33 +02:00
2026-05-26 20:52:33 +02:00
2026-05-26 20:52:33 +02:00
2026-05-26 20:52:33 +02:00

neuronetz-gateway

A secure, multi-tenant API gateway in front of an Ollama instance. It is the hot path of the Neuronetz API: every request to the models flows through here, authenticated, rate-limited, budgeted, and audited.

The Ollama backend is never reachable from the public internet. It is bound to an internal Docker network with no published ports. All access is via this gateway, behind TLS terminated by Caddy.

Status: v0.1.0 — in development. See scope-docs/SPEC.md for the full specification and scope-docs/AGENT_PROMPT.md for the phased build plan. SPEC.md is the source of truth.

What it does

  • Auth — API keys as Bearer tokens, stored as Argon2id hashes, verified in constant time.
  • Multi-tenant — tenants own keys; limits and budgets inherit tenant → key.
  • Rate limiting — per-key and per-tenant RPM / TPM / concurrent connections.
  • Budgets — daily / monthly / total token budgets, enforced fail-closed.
  • Dual API surface — native Ollama (/api/*) and OpenAI-compatible (/v1/*), both streaming.
  • Hard-blocked mutations/api/pull, /api/push, /api/create, /api/copy, /api/delete, /api/blobs/* always return 403. Not configurable.
  • Audit log — always-on request metadata; opt-in, TTL'd prompt logging per key.

Administration (dashboards, tenant self-service) lives in a separate service, neuronetz-console; it is not part of this repository.

Architecture

Internet ──TLS──> Caddy ──HTTP──> gateway ──┬──> Postgres   (keys, budgets, audit)
                                            ├──> Redis      (key cache, rate limits)
                                            └──> Ollama     (internal network only)

Quickstart (dev)

Requires Docker + Docker Compose. The dev stack runs Postgres, Redis, and the gateway — no Caddy and no Ollama (so /readyz reports 503 until a real Ollama backend is wired in; that is expected).

git clone <repo> neuronetz-gateway && cd neuronetz-gateway
cp .env.example .env          # adjust if you like; defaults work for local dev
docker compose -f docker-compose.dev.yml up --build

The gateway runs alembic upgrade head on startup, then serves on http://localhost:8080.

curl -i http://localhost:8080/healthz   # -> 200  {"status":"ok"}
curl -i http://localhost:8080/readyz    # -> 503  (no Ollama backend in the dev stack)

Production

docker-compose.yml brings up the full stack — Caddy (TLS via Let's Encrypt for api.neuronetz.ai), the gateway, Postgres, Redis, and Ollama. The ollama service has no ports: mapping and is reachable only on the internal Docker network. See docs/DEPLOYMENT.md (added in a later phase) and ops/caddy/Caddyfile.example.

Managing tenants and keys

Use the bootstrap CLI (Typer). Keys have the form nz_<prefix><secret>; the full key is printed exactly once at creation and only its Argon2id hash is stored.

neuronetz-gateway create-tenant --name acme
neuronetz-gateway create-key   --tenant acme --name prod-server-1
neuronetz-gateway list-keys    --tenant acme
neuronetz-gateway revoke-key   --prefix nz_abc12345

Development

just dev          # run the dev stack
just test         # pytest + coverage
just lint         # ruff
just typecheck    # mypy --strict
just migrate      # alembic upgrade head

Tooling: Python 3.12, uv, FastAPI + uvicorn, SQLAlchemy 2.0 (async) + asyncpg, Redis, httpx, structlog, Pydantic. Lint/type/security gates: ruff, mypy --strict, bandit, pip-audit.

License

Apache 2.0 — see LICENSE. Owner: Stephan Berbig / Neuronetz.

Description
AI API
Readme Apache-2.0 290 KiB
Languages
Python 86.2%
HTML 8.1%
Shell 4.4%
Dockerfile 0.9%
Just 0.4%