# neuronetz-gateway — Architecture Distilled from [`scope-docs/SPEC.md`](../scope-docs/SPEC.md) §4. The SPEC is the source of truth. The gateway is the **hot path** of the Neuronetz API: a secure, multi-tenant proxy in front of an Ollama instance. The Ollama backend must never be reachable directly from the public internet — all access flows through this gateway. Administration (dashboards, tenant self-service) lives in a separate service, `neuronetz-console`, and is out of scope here. --- ## Component diagram (SPEC §4.1) ``` Internet │ TLS ▼ ┌──────────────────────┐ │ Caddy (sidecar) │ Let's Encrypt for api.neuronetz.ai │ - TLS termination │ HSTS, security headers │ - HTTP/2, HTTP/3 │ └──────────┬───────────┘ │ HTTP/1.1 internal ┌──────────▼───────────┐ │ neuronetz-gateway │ FastAPI + uvicorn │ - authn │ │ - rate limit │ │ - budget check │ │ - proxy + stream │ │ - token count │ │ - audit write │ └──┬────────┬──────┬───┘ │ │ │ ┌──────▼──┐ ┌──▼───┐ │ │Postgres │ │Redis │ │ │ schema: │ │ keys │ │ │ gateway │ │bucket│ │ └─────────┘ └──────┘ │ │ internal network only ┌──────▼──────┐ │ Ollama │ │ 127.0.0.1 │ └─────────────┘ Same Compose stack also hosts (separate from this SPEC): - neuronetz-console (PHP/Nibiru) → reads schema `console`, reads schema `gateway` (SELECT) ``` Only **Caddy** publishes ports. Postgres, Redis and (critically) **Ollama** have no published ports and are reachable only on the internal Docker network. --- ## Database schemas (SPEC §4.2) A single Postgres instance with two schemas: - **`gateway`** — owned by this service; full DDL. Tables: `tenants`, `tenant_limits`, `api_keys`, `key_limits`, `budget_usage`, `audit_log`, `prompt_log`, `revocations` (see SPEC §5 for the full DDL). - **`console`** — owned by `neuronetz-console` (out of scope). The console role gets `SELECT` on all `gateway.*` tables and `INSERT` on `gateway.revocations` only. If the console needs to mutate gateway state (e.g. revoke a key), it does so by inserting into the `gateway.revocations` **outbox** table, which the gateway tails (see Revocation below). **Limit inheritance:** limits and budgets resolve key → tenant. A `NULL` key-level value inherits the tenant value. For `allow_all_models`, a non-`NULL` key value overrides the tenant flag; otherwise the tenant flag applies (SPEC §13.7). --- ## Request lifecycle (SPEC §4.3) 1. Caddy terminates TLS and forwards to the gateway on the internal port. 2. Middleware extracts `Authorization: Bearer `. 3. The 12-char prefix is the Redis cache key. On miss, look up `gateway.api_keys` by prefix, verify the full key with argon2id, and cache resolved metadata in Redis (TTL 60 s). 4. **Rate limit** check — sliding window in Redis (Lua-atomic): per-key RPM + per-tenant RPM. 5. **Budget** check — Redis counter for the current period; Postgres ledger is the source of truth on reset. 6. **Concurrency** semaphore — Redis `INCR` with TTL. 7. **Model allowlist** check — resolve the effective set (see below); the request `model` must be in it, else a generic `403`. 8. **Endpoint allowlist** check — mutating endpoints are hard-blocked. 9. **Body validation** — size, schema, `num_predict` cap. 10. If an OpenAI-compat path, translate the request to the Ollama schema. 11. Open an httpx async stream to Ollama. 12. Stream the response back to the client, accumulating the final `prompt_eval_count` + `eval_count`. 13. On stream close: write the `gateway.audit_log` row; decrement the budget; release the semaphore; if prompt logging is enabled, write `gateway.prompt_log`. 14. On any failure: sanitized error to the client, audit row with the status code, semaphore released. **Streaming integrity:** token counting and the audit write happen **after** stream close, never on the hot path — time-to-first-byte is not degraded by bookkeeping (SPEC §9). --- ## Model discovery (SPEC §4.6) The set of usable models is **never hand-maintained**; it is extracted live from Ollama. - A background task (started in the app lifespan, alongside the revocation listener) polls Ollama `GET /api/tags` every `MODEL_DISCOVERY_REFRESH_S` seconds. - The parsed set (names + sanitized metadata: family, parameter size, quantization, size, modified-at) is cached in Redis under `gateway:models:discovered` with TTL `MODEL_DISCOVERY_CACHE_TTL_S`, and held in-process for hot reads on the request path. - An initial fetch runs at startup; if Ollama is unreachable the discovered set is empty. - **Fail-closed:** an empty or expired-and-unrefreshable discovered set means *no model resolves* and requests are denied. Discovery never opens access on failure. - **Auto-grant:** because the effective set intersects with `discovered` (or *is* `discovered` when `allow_all_models`), a model pulled into Ollama out-of-band becomes usable to `allow_all` tenants on the next refresh — no per-tenant config change. - Discovery is **read-only** against Ollama and uses only the allowlisted `/api/tags` endpoint; it never triggers a model pull. ### Effective-set resolution (SPEC §4.3 step 7) ``` allow_all := key.allow_all_models ?? tenant.allow_all_models effective := discovered if allow_all (key.allowed_models ?? tenant.allowed_models) ∩ discovered otherwise ``` `/api/tags` and `/v1/models` return exactly this effective set, so the listing never reveals models outside the tenant's reach. A model that is installed-but-unpermitted and one that is not installed both return the same generic `403` — no existence disclosure (SPEC §13.6). --- ## Failure modes — fail-closed (SPEC §4.4) | Subsystem | If down | Behavior | |---|---|---| | Postgres (read) | Key lookup fails | `503` with retry-after; nothing proxied. | | Postgres (write) | Audit write fails | Request still succeeds; audit row buffered in-memory ring (max 1000), drained on recovery; if the buffer fills, switch to deny mode. | | Redis | Rate limit / budget unavailable | `503` — fail closed. Never "allow because we can't check." | | Ollama | Upstream unreachable | `502` with retry-after; circuit breaker opens after 5 consecutive failures, half-open after 30 s. | | Caddy | Not a gateway concern | — | The governing rule (AGENT_PROMPT non-negotiable #1): **if a security or budgeting check cannot be performed, deny.** Never default to allow. --- ## Cache invalidation / key revocation (SPEC §4.5) The console revokes a key by inserting into `gateway.revocations(key_id, ts, reason)`. A background task in the gateway lifespan: - `LISTEN`s on the Postgres channel `key_revoked` (the gateway emits `NOTIFY` on its own write path; the console's INSERT fires a trigger that emits it). - On notification, evicts the Redis cache entry for that key's prefix. This makes revocation effectively immediate (≤ Redis RTT) with no cross-service HTTP. --- ## Observability - **Structured logs** (structlog), JSON in production. Secrets/keys are never logged. - **Prometheus** `/metrics` (loopback only): `gateway_requests_total{tenant,model,status}`, `gateway_tokens_total{tenant,model,direction}`, `gateway_request_duration_seconds{tenant,model}` (histogram). Labelled by `tenant`, never by `key_id` (cardinality — SPEC §13.3); per-key data lives in Postgres. - **Audit log** — always-on request metadata. **Prompt log** — opt-in per key, TTL'd.