neuronetz-gateway/docs/DEPLOYMENT.md

# neuronetz-gateway — Deployment

Production deployment is a Docker Compose stack — **gateway + Postgres + Redis + Ollama** —
that sits behind the host's existing **jwilder/nginx-proxy** stack (the same one already
serving `neuronetz.ai` / `neuro-landing`). Public traffic enters via `nginx-proxy` and
`acme-companion`, which terminate TLS and obtain/renew the Let's Encrypt certificate for
`api.neuronetz.ai`. The gateway joins the host's external `proxy` Docker network alongside
the other public-facing containers and advertises itself with `VIRTUAL_HOST` /
`VIRTUAL_PORT`. Postgres, Redis, and Ollama stay on a private internal network with no
published ports.

> ▶ Don't have jwilder-proxy on the host? See
> [§ "Alternative: TLS via Caddy sidecar"](#alternative-tls-via-caddy-sidecar) — the
> `ops/caddy/Caddyfile.example` is shipped for that case.

> For the local, no-GPU demo (mock Ollama + playground), see [`PLAYGROUND.md`](PLAYGROUND.md)
> and run `./demo.sh`. This document is the **production** path.

---

## The one rule that must never break

> ## ⛔ Ollama is NEVER exposed to the host or the internet.
>
> The `ollama` service in `docker-compose.yml` has **no `ports:` mapping** and must never
> get one. Ollama is reachable only on the internal Docker network as `ollama:11434`.
> Publishing it would re-open the exact unauthenticated exposure this whole project exists
> to close (SPEC §1, §3; AGENT_PROMPT non-negotiable #2).

The same posture applies to **Postgres**, **Redis**, and the gateway itself in the
production compose file — **no published ports anywhere in this compose file**. Only
the host's jwilder `nginx-proxy` container binds 80/443; the gateway is reached via the
shared external `proxy` Docker network.

---

## Prerequisites

- A host with Docker + Docker Compose.
- A jwilder-proxy stack already running on the host, attached to an external Docker
  network named `proxy`. Typically `jwilder/nginx-proxy` + `nginxproxy/acme-companion`,
  the same setup serving `neuronetz.ai` / `neuro-landing`.
- DNS: `api.neuronetz.ai` → the host's public IP.
- Ports 80 and 443 already published by the jwilder-proxy container on that host (for
  ACME HTTP-01 + serving). This compose file does **not** publish them itself.

---

## Steps (production — jwilder-proxy)

```bash
git clone ssh://git@gitea.neuronetz.ai:222/m17hr1l/neuronetz-gateway.git
cd neuronetz-gateway

# 1. Configure. Copy the example env and change EVERY secret.
cp .env.example .env
#   - POSTGRES_PASSWORD          : a strong, unique value
#   - GATEWAY_VIRTUAL_HOST       : api.neuronetz.ai  (read by nginx-proxy)
#   - LETSENCRYPT_EMAIL          : admin@neuronetz.ai  (read by acme-companion)
#   - GATEWAY_LOG_FORMAT=json    : for production
#   - GATEWAY_TRUSTED_PROXIES    : 127.0.0.1,nginx-proxy

# 2. Bring up the stack. The gateway joins the external `proxy` network and
#    runs `alembic upgrade head` before serving.
docker compose up -d --build
#   nginx-proxy observes the new container, generates an nginx vhost for
#   api.neuronetz.ai, and acme-companion issues the cert via Let's Encrypt.
#   Cert renewals are automatic.

# 3. Bootstrap a tenant + key (CLI runs inside the gateway container).
docker compose exec gateway neuronetz-gateway create-tenant --name acme --rpm 120 --tpm 200000
docker compose exec gateway neuronetz-gateway create-key --tenant acme --name prod-server-1
#   ^ prints the full key ONCE — store it in your secret manager now.

# 4. Smoke test through public TLS.
curl https://api.neuronetz.ai/healthz
curl -N https://api.neuronetz.ai/v1/chat/completions \
  -H "Authorization: Bearer nz_…" -H "Content-Type: application/json" \
  -d '{"model":"llama3.1:8b","stream":true,"messages":[{"role":"user","content":"hi"}]}'
```

The compose file pins `container_name: neuronetz-gateway` (and `neuronetz-postgres` /
`neuronetz-redis` / `neuronetz-ollama`) for stable identification by nginx-proxy and
for ops scripts.

---

## Pointing at a real Ollama backend

The gateway reaches Ollama via `OLLAMA_BASE_URL`. In the bundled stack this is the in-stack
`ollama` service: `OLLAMA_BASE_URL=http://ollama:11434`.

To use an **existing/external** Ollama host instead:

1. Remove the `ollama` service from `docker-compose.yml` (or leave it; it just won't be used).
2. Set `OLLAMA_BASE_URL` to the backend address reachable from the gateway container, e.g.
   `http://10.0.0.5:11434` or an internal DNS name.
3. Ensure that backend is itself **not** exposed to the internet — the gateway is the only
   thing that should ever reach it. Use a private network / firewall rule, not a public port.
4. Pull the models you want available on that backend. They appear in tenants' effective sets
   automatically on the next discovery refresh (SPEC §4.6) — no gateway config change for
   `allow_all_models` tenants.

Discovery polls `OLLAMA_BASE_URL/api/tags` every `MODEL_DISCOVERY_REFRESH_S` seconds. If the
backend is unreachable, the discovered set is empty and requests **fail closed**.

---

## Environment reference (SPEC §7)

All configuration is via environment variables, validated by Pydantic Settings on boot. Boot
**fails loudly** on invalid config. See [`.env.example`](../.env.example) for a copyable file.

### Service
| Var | Default | Notes |
|---|---|---|
| `GATEWAY_BIND_HOST` | `0.0.0.0` | Bind-all inside the container. |
| `GATEWAY_BIND_PORT` | `8080` | Internal port; never published directly in prod. |
| `GATEWAY_LOG_LEVEL` | `INFO` | |
| `GATEWAY_LOG_FORMAT` | `json` | `json` in prod, `console` for local dev. |
| `GATEWAY_REQUEST_ID_HEADER` | `X-Request-ID` | |
| `GATEWAY_TRUSTED_PROXIES` | `127.0.0.1,nginx-proxy` | Sources trusted for `X-Forwarded-For`. Set to your front-proxy's container name / IP. |
| `GATEWAY_VIRTUAL_HOST` | `api.neuronetz.ai` | Read by jwilder `nginx-proxy` and `acme-companion`. |
| `LETSENCRYPT_EMAIL` | `admin@neuronetz.ai` | Read by `acme-companion`. |

### Upstream (Ollama)
| Var | Default | Notes |
|---|---|---|
| `OLLAMA_BASE_URL` | `http://ollama:11434` | Internal address of the backend. |
| `OLLAMA_CONNECT_TIMEOUT_S` | `5` | |
| `OLLAMA_READ_TIMEOUT_S` | `600` | Long, for slow generations. |
| `OLLAMA_MAX_CONNECTIONS` | `64` | httpx pool size. |

### Model discovery (§4.6)
| Var | Default | Notes |
|---|---|---|
| `MODEL_DISCOVERY_REFRESH_S` | `60` | How often to re-query `/api/tags`. |
| `MODEL_DISCOVERY_CACHE_TTL_S` | `120` | Redis TTL for the discovered set. |

### Database
| Var | Default | Notes |
|---|---|---|
| `DATABASE_URL` | `postgresql+asyncpg://…` | asyncpg driver. |
| `DATABASE_POOL_SIZE` | `10` | |
| `DATABASE_POOL_OVERFLOW` | `20` | |

### Redis
| Var | Default | Notes |
|---|---|---|
| `REDIS_URL` | `redis://redis:6379/0` | |
| `REDIS_KEY_CACHE_TTL_S` | `60` | Resolved-key cache TTL. |

### Limits (defaults; per-tenant/key DB overrides win)
| Var | Default | Notes |
|---|---|---|
| `DEFAULT_RPM` | `60` | |
| `DEFAULT_TPM` | `100000` | |
| `DEFAULT_CONCURRENT` | `8` | |
| `MAX_REQUEST_BODY_BYTES` | `262144` | 256 KiB request cap. |
| `MAX_NUM_PREDICT` | `4096` | Hard cap on requested completion tokens. |

### Security
| Var | Default | Notes |
|---|---|---|
| `ARGON2_TIME_COST` | `3` | |
| `ARGON2_MEMORY_COST_KIB` | `65536` | 64 MiB. |
| `ARGON2_PARALLELISM` | `4` | |
| `AUTH_FAILURE_RATE_LIMIT_PER_IP_PER_MIN` | `20` | Throttles auth brute-force per source IP. |

### Audit
| Var | Default | Notes |
|---|---|---|
| `AUDIT_BUFFER_SIZE` | `1000` | Ring buffer; full ⇒ deny mode. |
| `PROMPT_LOG_DEFAULT_RETENTION_DAYS` | `30` | |
| `AUDIT_LOG_DEFAULT_RETENTION_DAYS` | `365` | |

---

## TLS & security headers

In the canonical (jwilder-proxy) setup, TLS termination and security headers belong on
the host's `nginx-proxy` container, not in this repo. Use the standard nginx-proxy
custom-config mechanism (`/etc/nginx/vhost.d/api.neuronetz.ai`) to add HSTS and the rest:

```
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
add_header X-Content-Type-Options    "nosniff"                                       always;
add_header X-Frame-Options           "DENY"                                          always;
add_header Referrer-Policy           "no-referrer"                                   always;
```

If you prefer to terminate TLS in this repo (no jwilder-proxy on the host), see the
section below.

<a id="alternative-tls-via-caddy-sidecar"></a>
## Alternative: TLS via Caddy sidecar

`ops/caddy/Caddyfile.example` is provided for hosts without jwilder-proxy. It sets HSTS,
the security headers above, strips the `Server` header, and obtains a Let's Encrypt
cert. To use it, add a `caddy` service to your local copy of `docker-compose.yml`
(binding host 80/443), drop the gateway's `VIRTUAL_HOST` / `LETSENCRYPT_HOST` env vars,
and remove the `proxy` external-network requirement. The Caddyfile itself is self-
documenting; edit the site address and ACME `email` before deploying.

---

## Non-Compose (systemd)

A systemd unit is provided for hosts that run the image directly (`ops/systemd/`). The
gateway still requires reachable Postgres, Redis, and Ollama, and the same environment
variables. TLS in that topology is whatever fronts the host (Caddy, nginx, a load balancer) —
**Ollama still must not be publicly reachable.**

---

## Upgrades & migrations

The gateway runs `alembic upgrade head` on container start, so a normal
`docker compose up -d --build` after pulling a new version applies pending migrations. For
zero-downtime upgrades, run migrations as a one-off
(`docker compose run --rm gateway alembic upgrade head`) before rolling the service.