# neuronetz-gateway — Operations Runbook Day-2 operations for the gateway: managing tenants and keys, budgets, model policy, usage, and the fail-closed behaviors you'll encounter. All administration is via the **bootstrap CLI** (SPEC §11), run inside the gateway container. There are no admin HTTP endpoints in the gateway (that's `neuronetz-console`'s job). > Run the CLI inside the running container: > ```bash > docker compose exec gateway neuronetz-gateway … > ``` > In the demo stack, swap the compose file: `docker compose -f docker-compose.demo.yml exec gateway …` --- ## Keys ### Create a key ```bash docker compose exec gateway neuronetz-gateway create-key --tenant acme --name prod-server-1 # optional: --scopes chat,embeddings (default: chat,embeddings) ``` The **full key is printed exactly once** in the form `nz_`. Store it immediately in your secret manager — it is argon2id-hashed and cannot be recovered. Only the 12-char `prefix` is retained server-side. ### List keys (never shows full keys) ```bash docker compose exec gateway neuronetz-gateway list-keys --tenant acme # prints: status=active name='prod-server-1' created=… ``` ### Revoke a key ```bash docker compose exec gateway neuronetz-gateway revoke-key --prefix nz_abc12345 ``` This sets the key status to `revoked` and writes the `gateway.revocations` outbox row. A Postgres `NOTIFY` on channel `key_revoked` fires; the gateway evicts the key's Redis cache entry, so revocation takes effect within ~1 second (SPEC §4.5) without restarting anything. A subsequent request with that key returns **401**. > The console (`neuronetz-console`) revokes keys the same way — by inserting into > `gateway.revocations`. The trigger-driven NOTIFY makes it immediate without any > cross-service HTTP call. ### Rotate a key There is no in-place rotate. Rotate by: create a new key → deploy it to the client → verify traffic on the new prefix → revoke the old prefix. --- ## Tenants & limits ### Create a tenant ```bash docker compose exec gateway neuronetz-gateway create-tenant --name acme \ --rpm 120 --tpm 200000 --concurrent 8 # add --allow-all-models to opt into using any installed model (default: off) ``` Limits inherit **key → tenant**: a `NULL` key-level limit uses the tenant value. --- ## Budgets Set per-key token budgets (any combination of daily / monthly / total): ```bash docker compose exec gateway neuronetz-gateway set-budget --key nz_abc12345 \ --daily 1000000 --monthly 30000000 --total 500000000 ``` - Budgets are enforced **fail-closed**: when the binding period hits zero remaining, requests return **429** with a descriptive error and a `Retry-After` header. The binding period and remaining balance are surfaced on every response via `X-Budget-Period` and `X-Budget-Tokens-Remaining` (SPEC §6.5). - Live counters live in Redis; the Postgres ledger (`gateway.budget_usage`) is the source of truth on period rollover/reset. --- ## Model policy ### Set an explicit allowlist (default-deny) ```bash docker compose exec gateway neuronetz-gateway set-models --tenant acme \ --models llama3.1:8b,mistral:7b ``` The tenant's **effective set** is `allowed_models ∩ discovered` — entries that aren't actually installed on the backend silently never resolve. A request for a model outside the effective set returns a generic **403** (same response as "doesn't exist" — no enumeration). ### Toggle `allow_all_models` ```bash docker compose exec gateway neuronetz-gateway set-models --tenant acme --allow-all # opt in docker compose exec gateway neuronetz-gateway set-models --tenant acme --no-allow-all # back to allowlist ``` With `allow_all_models` on, the effective set **is** the live discovered set — any model pulled into Ollama becomes usable on the next discovery refresh, with no further config change. This is an audited convenience; prefer explicit allowlists for untrusted tenants (see [`THREAT_MODEL.md`](THREAT_MODEL.md)). ### Inspect discovery and effective sets ```bash docker compose exec gateway neuronetz-gateway list-models # live-discovered models docker compose exec gateway neuronetz-gateway list-models --tenant acme # + that tenant's effective set ``` --- ## Usage ```bash docker compose exec gateway neuronetz-gateway show-usage --tenant acme --period day # prints: requests=… tokens_in=… tokens_out=… (period: day|month|total) ``` For per-key forensics and finer slicing, query `gateway.audit_log` directly (it records `request_id`, `key_prefix`, `model`, `tokens_in/out`, `status`, `latency_ms`, `client_ip`). --- ## How model discovery refresh works (SPEC §4.6) - A background task polls Ollama `GET /api/tags` every `MODEL_DISCOVERY_REFRESH_S` seconds and caches the result in Redis (`gateway:models:discovered`, TTL `MODEL_DISCOVERY_CACHE_TTL_S`) plus an in-process copy for hot reads. - A model pulled into Ollama out-of-band appears in `allow_all_models` tenants' effective sets within one refresh interval — no config change. - Discovery is **read-only** and uses only the allowlisted `/api/tags` endpoint; it never triggers a pull. - To force a faster pickup, lower `MODEL_DISCOVERY_REFRESH_S` (the demo uses 15 s). --- ## Fail-closed behaviors to expect | Symptom | Cause | Correct behavior | |---|---|---| | `503` on every request | Redis or Postgres-read down | Fail-closed — rate-limit/budget/auth can't be checked, so deny. Restore the backend. | | `502` with retry-after | Ollama unreachable | Circuit breaker opens after 5 consecutive failures, half-opens after 30 s. Check the backend / `OLLAMA_BASE_URL`. | | `403` for a model you "know" exists | Model not in the tenant's effective set, **or** discovery cache empty/expired | Check `list-models --tenant …`; verify the backend is reachable and the model is installed. Empty discovery = deny by design. | | `429` with `Retry-After` | Rate limit or budget exhausted | Inspect headers (`X-RateLimit-*`, `X-Budget-*`); raise limits/budget or wait. | | `401` immediately after revoke | Working as intended | Revocation propagated via NOTIFY → Redis eviction. | `/readyz` returns `503` when **any** dependency (DB, Redis, Ollama) is unreachable; use it as the load-balancer health gate. `/healthz` only checks process liveness. --- ## Logs, metrics, audit - **Logs:** structured (structlog), JSON in production, to stdout. Keys/secrets are never logged. - **Metrics:** Prometheus at `/metrics` (loopback only): `gateway_requests_total`, `gateway_tokens_total`, `gateway_request_duration_seconds`, labelled by `tenant` and `model` (never `key_id`). - **Audit log:** always-on in `gateway.audit_log`. **Prompt log** is opt-in per key and TTL'd (`PROMPT_LOG_DEFAULT_RETENTION_DAYS`); a sweeper enforces retention.