The hot path. A single Pipeline class owns enforcement so the eight
non-negotiables can be reviewed in one place.
- Native /api/chat, /api/generate (NDJSON streaming + non-stream), /api/tags,
/api/show (system-prompt + template stripped), /api/embed(dings), /api/version
(returns gateway version, not Ollama's). Endpoint catch-all returns the same
generic 403 for hard-blocked and unknown /api/* paths so attackers cannot
enumerate which mutating endpoints exist.
- OpenAI-compat /v1/chat/completions, /v1/completions, /v1/embeddings,
/v1/models with SSE (`data: {...}` + final `data: [DONE]`); preserves
streaming end-to-end.
- Model discovery (SPEC §4.6): background poller against Ollama /api/tags;
Redis + in-process cache (TTL = MODEL_DISCOVERY_CACHE_TTL_S, refresh =
MODEL_DISCOVERY_REFRESH_S); fail-closed when the discovered set is empty.
- Effective-set resolution in proxy/allowlist.py:
allow_all = key.allow_all_models ?? tenant.allow_all_models
effective = discovered if allow_all
else (key.allowed_models ?? tenant.allowed_models) ∩ discovered
A non-effective model returns the same generic 403 whether it's installed-
but-unpermitted or doesn't exist at all (no enumeration leak).
- Sliding-window rate limit (Redis Lua, single round-trip) for per-key +
per-tenant RPM and per-key TPM. Redis-INCR/DECR concurrency semaphore with
TTL guard. Token-budget counters per (key, period) with a Postgres ledger
for reconciliation across resets. Headers per SPEC §6.5 on every response;
429 carries Retry-After; Redis outage → 503 (fail closed, never 200).
- Token counting from the FINAL stream object (NDJSON `done` or the SSE chunk
carrying `usage`); the audit row is written AFTER stream close so TTFB is
never degraded by bookkeeping.
- Audit writer: asyncio.Queue + bounded ring buffer; deny-mode flip on overflow.
Optional prompt log per key (TTL'd).
- Revocation listener: asyncpg LISTEN on key_revoked → evict the Redis cache
entry within ~1s of the console writing to gateway.revocations.
- Prometheus counters/histograms labeled by tenant only (per SPEC §13.3).
neuronetz-gateway
A secure, multi-tenant API gateway in front of an Ollama instance. It is the hot path of the Neuronetz API: every request to the models flows through here, authenticated, rate-limited, budgeted, and audited.
The Ollama backend is never reachable from the public internet. It is bound to an internal Docker network with no published ports. All access is via this gateway, behind TLS terminated by Caddy.
Status: v0.1.0 — in development. See
scope-docs/SPEC.mdfor the full specification andscope-docs/AGENT_PROMPT.mdfor the phased build plan.SPEC.mdis the source of truth.
What it does
- Auth — API keys as Bearer tokens, stored as Argon2id hashes, verified in constant time.
- Multi-tenant — tenants own keys; limits and budgets inherit tenant → key.
- Rate limiting — per-key and per-tenant RPM / TPM / concurrent connections.
- Budgets — daily / monthly / total token budgets, enforced fail-closed.
- Dual API surface — native Ollama (
/api/*) and OpenAI-compatible (/v1/*), both streaming. - Hard-blocked mutations —
/api/pull,/api/push,/api/create,/api/copy,/api/delete,/api/blobs/*always return 403. Not configurable. - Audit log — always-on request metadata; opt-in, TTL'd prompt logging per key.
Administration (dashboards, tenant self-service) lives in a separate service,
neuronetz-console; it is not part of this repository.
Architecture
Internet ──TLS──> Caddy ──HTTP──> gateway ──┬──> Postgres (keys, budgets, audit)
├──> Redis (key cache, rate limits)
└──> Ollama (internal network only)
Quickstart (dev)
Requires Docker + Docker Compose. The dev stack runs Postgres, Redis, and the gateway —
no Caddy and no Ollama (so /readyz reports 503 until a real Ollama backend is wired
in; that is expected).
git clone <repo> neuronetz-gateway && cd neuronetz-gateway
cp .env.example .env # adjust if you like; defaults work for local dev
docker compose -f docker-compose.dev.yml up --build
The gateway runs alembic upgrade head on startup, then serves on http://localhost:8080.
curl -i http://localhost:8080/healthz # -> 200 {"status":"ok"}
curl -i http://localhost:8080/readyz # -> 503 (no Ollama backend in the dev stack)
Production
docker-compose.yml brings up the full stack — Caddy (TLS via Let's Encrypt for
api.neuronetz.ai), the gateway, Postgres, Redis, and Ollama. The ollama service has
no ports: mapping and is reachable only on the internal Docker network. See
docs/DEPLOYMENT.md (added in a later phase) and
ops/caddy/Caddyfile.example.
Managing tenants and keys
Use the bootstrap CLI (Typer). Keys have the form nz_<prefix><secret>; the full key is
printed exactly once at creation and only its Argon2id hash is stored.
neuronetz-gateway create-tenant --name acme
neuronetz-gateway create-key --tenant acme --name prod-server-1
neuronetz-gateway list-keys --tenant acme
neuronetz-gateway revoke-key --prefix nz_abc12345
Development
just dev # run the dev stack
just test # pytest + coverage
just lint # ruff
just typecheck # mypy --strict
just migrate # alembic upgrade head
Tooling: Python 3.12, uv, FastAPI + uvicorn, SQLAlchemy 2.0 (async) + asyncpg, Redis,
httpx, structlog, Pydantic. Lint/type/security gates: ruff, mypy --strict, bandit,
pip-audit.
License
Apache 2.0 — see LICENSE. Owner: Stephan Berbig / Neuronetz.