neuronetz-gateway/docs/OPERATIONS.md

# neuronetz-gateway — Operations Runbook

Day-2 operations for the gateway: managing tenants and keys, budgets, model policy, usage,
and the fail-closed behaviors you'll encounter. All administration is via the **bootstrap
CLI** (SPEC §11), run inside the gateway container. There are no admin HTTP endpoints in the
gateway (that's `neuronetz-console`'s job).

> Run the CLI inside the running container:
> ```bash
> docker compose exec gateway neuronetz-gateway <command> …
> ```
> In the demo stack, swap the compose file: `docker compose -f docker-compose.demo.yml exec gateway …`

---

## Keys

### Create a key

```bash
docker compose exec gateway neuronetz-gateway create-key --tenant acme --name prod-server-1
# optional: --scopes chat,embeddings   (default: chat,embeddings)
```

The **full key is printed exactly once** in the form `nz_<prefix><secret>`. Store it
immediately in your secret manager — it is argon2id-hashed and cannot be recovered. Only the
12-char `prefix` is retained server-side.

### List keys (never shows full keys)

```bash
docker compose exec gateway neuronetz-gateway list-keys --tenant acme
# prints: <prefix>  status=active  name='prod-server-1'  created=…
```

### Revoke a key

```bash
docker compose exec gateway neuronetz-gateway revoke-key --prefix nz_abc12345
```

This sets the key status to `revoked` and writes the `gateway.revocations` outbox row. A
Postgres `NOTIFY` on channel `key_revoked` fires; the gateway evicts the key's Redis cache
entry, so revocation takes effect within ~1 second (SPEC §4.5) without restarting anything.
A subsequent request with that key returns **401**.

> The console (`neuronetz-console`) revokes keys the same way — by inserting into
> `gateway.revocations`. The trigger-driven NOTIFY makes it immediate without any
> cross-service HTTP call.

### Rotate a key

There is no in-place rotate. Rotate by: create a new key → deploy it to the client → verify
traffic on the new prefix → revoke the old prefix.

---

## Tenants & limits

### Create a tenant

```bash
docker compose exec gateway neuronetz-gateway create-tenant --name acme \
  --rpm 120 --tpm 200000 --concurrent 8
# add --allow-all-models to opt into using any installed model (default: off)
```

Limits inherit **key → tenant**: a `NULL` key-level limit uses the tenant value.

---

## Budgets

Set per-key token budgets (any combination of daily / monthly / total):

```bash
docker compose exec gateway neuronetz-gateway set-budget --key nz_abc12345 \
  --daily 1000000 --monthly 30000000 --total 500000000
```

- Budgets are enforced **fail-closed**: when the binding period hits zero remaining, requests
  return **429** with a descriptive error and a `Retry-After` header. The binding period and
  remaining balance are surfaced on every response via `X-Budget-Period` and
  `X-Budget-Tokens-Remaining` (SPEC §6.5).
- Live counters live in Redis; the Postgres ledger (`gateway.budget_usage`) is the source of
  truth on period rollover/reset.

---

## Model policy

### Set an explicit allowlist (default-deny)

```bash
docker compose exec gateway neuronetz-gateway set-models --tenant acme \
  --models llama3.1:8b,mistral:7b
```

The tenant's **effective set** is `allowed_models ∩ discovered` — entries that aren't
actually installed on the backend silently never resolve. A request for a model outside the
effective set returns a generic **403** (same response as "doesn't exist" — no enumeration).

### Toggle `allow_all_models`

```bash
docker compose exec gateway neuronetz-gateway set-models --tenant acme --allow-all      # opt in
docker compose exec gateway neuronetz-gateway set-models --tenant acme --no-allow-all   # back to allowlist
```

With `allow_all_models` on, the effective set **is** the live discovered set — any model
pulled into Ollama becomes usable on the next discovery refresh, with no further config
change. This is an audited convenience; prefer explicit allowlists for untrusted tenants
(see [`THREAT_MODEL.md`](THREAT_MODEL.md)).

### Inspect discovery and effective sets

```bash
docker compose exec gateway neuronetz-gateway list-models                 # live-discovered models
docker compose exec gateway neuronetz-gateway list-models --tenant acme   # + that tenant's effective set
```

---

## Usage

```bash
docker compose exec gateway neuronetz-gateway show-usage --tenant acme --period day
# prints: requests=…  tokens_in=…  tokens_out=…   (period: day|month|total)
```

For per-key forensics and finer slicing, query `gateway.audit_log` directly (it records
`request_id`, `key_prefix`, `model`, `tokens_in/out`, `status`, `latency_ms`, `client_ip`).

---

## How model discovery refresh works (SPEC §4.6)

- A background task polls Ollama `GET /api/tags` every `MODEL_DISCOVERY_REFRESH_S` seconds and
  caches the result in Redis (`gateway:models:discovered`, TTL `MODEL_DISCOVERY_CACHE_TTL_S`)
  plus an in-process copy for hot reads.
- A model pulled into Ollama out-of-band appears in `allow_all_models` tenants' effective sets
  within one refresh interval — no config change.
- Discovery is **read-only** and uses only the allowlisted `/api/tags` endpoint; it never
  triggers a pull.
- To force a faster pickup, lower `MODEL_DISCOVERY_REFRESH_S` (the demo uses 15 s).

---

## Fail-closed behaviors to expect

| Symptom | Cause | Correct behavior |
|---|---|---|
| `503` on every request | Redis or Postgres-read down | Fail-closed — rate-limit/budget/auth can't be checked, so deny. Restore the backend. |
| `502` with retry-after | Ollama unreachable | Circuit breaker opens after 5 consecutive failures, half-opens after 30 s. Check the backend / `OLLAMA_BASE_URL`. |
| `403` for a model you "know" exists | Model not in the tenant's effective set, **or** discovery cache empty/expired | Check `list-models --tenant …`; verify the backend is reachable and the model is installed. Empty discovery = deny by design. |
| `429` with `Retry-After` | Rate limit or budget exhausted | Inspect headers (`X-RateLimit-*`, `X-Budget-*`); raise limits/budget or wait. |
| `401` immediately after revoke | Working as intended | Revocation propagated via NOTIFY → Redis eviction. |

`/readyz` returns `503` when **any** dependency (DB, Redis, Ollama) is unreachable; use it as
the load-balancer health gate. `/healthz` only checks process liveness.

---

## Logs, metrics, audit

- **Logs:** structured (structlog), JSON in production, to stdout. Keys/secrets are never
  logged.
- **Metrics:** Prometheus at `/metrics` (loopback only): `gateway_requests_total`,
  `gateway_tokens_total`, `gateway_request_duration_seconds`, labelled by `tenant` and
  `model` (never `key_id`).
- **Audit log:** always-on in `gateway.audit_log`. **Prompt log** is opt-in per key and TTL'd
  (`PROMPT_LOG_DEFAULT_RETENTION_DAYS`); a sweeper enforces retention.