# neuronetz-gateway — Deployment Production deployment is a Docker Compose stack — **gateway + Postgres + Redis + Ollama** — that sits behind the host's existing **jwilder/nginx-proxy** stack (the same one already serving `neuronetz.ai` / `neuro-landing`). Public traffic enters via `nginx-proxy` and `acme-companion`, which terminate TLS and obtain/renew the Let's Encrypt certificate for `api.neuronetz.ai`. The gateway joins the host's external `proxy` Docker network alongside the other public-facing containers and advertises itself with `VIRTUAL_HOST` / `VIRTUAL_PORT`. Postgres, Redis, and Ollama stay on a private internal network with no published ports. > ▶ Don't have jwilder-proxy on the host? See > [§ "Alternative: TLS via Caddy sidecar"](#alternative-tls-via-caddy-sidecar) — the > `ops/caddy/Caddyfile.example` is shipped for that case. > For the local, no-GPU demo (mock Ollama + playground), see [`PLAYGROUND.md`](PLAYGROUND.md) > and run `./demo.sh`. This document is the **production** path. --- ## The one rule that must never break > ## ⛔ Ollama is NEVER exposed to the host or the internet. > > The `ollama` service in `docker-compose.yml` has **no `ports:` mapping** and must never > get one. Ollama is reachable only on the internal Docker network as `ollama:11434`. > Publishing it would re-open the exact unauthenticated exposure this whole project exists > to close (SPEC §1, §3; AGENT_PROMPT non-negotiable #2). The same posture applies to **Postgres**, **Redis**, and the gateway itself in the production compose file — **no published ports anywhere in this compose file**. Only the host's jwilder `nginx-proxy` container binds 80/443; the gateway is reached via the shared external `proxy` Docker network. --- ## Prerequisites - A host with Docker + Docker Compose. - A jwilder-proxy stack already running on the host, attached to an external Docker network named `proxy`. Typically `jwilder/nginx-proxy` + `nginxproxy/acme-companion`, the same setup serving `neuronetz.ai` / `neuro-landing`. - DNS: `api.neuronetz.ai` → the host's public IP. - Ports 80 and 443 already published by the jwilder-proxy container on that host (for ACME HTTP-01 + serving). This compose file does **not** publish them itself. --- ## Steps (production — jwilder-proxy) ```bash git clone ssh://git@gitea.neuronetz.ai:222/m17hr1l/neuronetz-gateway.git cd neuronetz-gateway # 1. Configure. Copy the example env and change EVERY secret. cp .env.example .env # - POSTGRES_PASSWORD : a strong, unique value # - GATEWAY_VIRTUAL_HOST : api.neuronetz.ai (read by nginx-proxy) # - LETSENCRYPT_EMAIL : admin@neuronetz.ai (read by acme-companion) # - GATEWAY_LOG_FORMAT=json : for production # - GATEWAY_TRUSTED_PROXIES : 127.0.0.1,nginx-proxy # 2. Bring up the stack. The gateway joins the external `proxy` network and # runs `alembic upgrade head` before serving. docker compose up -d --build # nginx-proxy observes the new container, generates an nginx vhost for # api.neuronetz.ai, and acme-companion issues the cert via Let's Encrypt. # Cert renewals are automatic. # 3. Bootstrap a tenant + key (CLI runs inside the gateway container). docker compose exec gateway neuronetz-gateway create-tenant --name acme --rpm 120 --tpm 200000 docker compose exec gateway neuronetz-gateway create-key --tenant acme --name prod-server-1 # ^ prints the full key ONCE — store it in your secret manager now. # 4. Smoke test through public TLS. curl https://api.neuronetz.ai/healthz curl -N https://api.neuronetz.ai/v1/chat/completions \ -H "Authorization: Bearer nz_…" -H "Content-Type: application/json" \ -d '{"model":"llama3.1:8b","stream":true,"messages":[{"role":"user","content":"hi"}]}' ``` The compose file pins `container_name: neuronetz-gateway` (and `neuronetz-postgres` / `neuronetz-redis` / `neuronetz-ollama`) for stable identification by nginx-proxy and for ops scripts. --- ## Pointing at a real Ollama backend The gateway reaches Ollama via `OLLAMA_BASE_URL`. In the bundled stack this is the in-stack `ollama` service: `OLLAMA_BASE_URL=http://ollama:11434`. To use an **existing/external** Ollama host instead: 1. Remove the `ollama` service from `docker-compose.yml` (or leave it; it just won't be used). 2. Set `OLLAMA_BASE_URL` to the backend address reachable from the gateway container, e.g. `http://10.0.0.5:11434` or an internal DNS name. 3. Ensure that backend is itself **not** exposed to the internet — the gateway is the only thing that should ever reach it. Use a private network / firewall rule, not a public port. 4. Pull the models you want available on that backend. They appear in tenants' effective sets automatically on the next discovery refresh (SPEC §4.6) — no gateway config change for `allow_all_models` tenants. Discovery polls `OLLAMA_BASE_URL/api/tags` every `MODEL_DISCOVERY_REFRESH_S` seconds. If the backend is unreachable, the discovered set is empty and requests **fail closed**. --- ## Environment reference (SPEC §7) All configuration is via environment variables, validated by Pydantic Settings on boot. Boot **fails loudly** on invalid config. See [`.env.example`](../.env.example) for a copyable file. ### Service | Var | Default | Notes | |---|---|---| | `GATEWAY_BIND_HOST` | `0.0.0.0` | Bind-all inside the container. | | `GATEWAY_BIND_PORT` | `8080` | Internal port; never published directly in prod. | | `GATEWAY_LOG_LEVEL` | `INFO` | | | `GATEWAY_LOG_FORMAT` | `json` | `json` in prod, `console` for local dev. | | `GATEWAY_REQUEST_ID_HEADER` | `X-Request-ID` | | | `GATEWAY_TRUSTED_PROXIES` | `127.0.0.1,nginx-proxy` | Sources trusted for `X-Forwarded-For`. Set to your front-proxy's container name / IP. | | `GATEWAY_VIRTUAL_HOST` | `api.neuronetz.ai` | Read by jwilder `nginx-proxy` and `acme-companion`. | | `LETSENCRYPT_EMAIL` | `admin@neuronetz.ai` | Read by `acme-companion`. | ### Upstream (Ollama) | Var | Default | Notes | |---|---|---| | `OLLAMA_BASE_URL` | `http://ollama:11434` | Internal address of the backend. | | `OLLAMA_CONNECT_TIMEOUT_S` | `5` | | | `OLLAMA_READ_TIMEOUT_S` | `600` | Long, for slow generations. | | `OLLAMA_MAX_CONNECTIONS` | `64` | httpx pool size. | ### Model discovery (§4.6) | Var | Default | Notes | |---|---|---| | `MODEL_DISCOVERY_REFRESH_S` | `60` | How often to re-query `/api/tags`. | | `MODEL_DISCOVERY_CACHE_TTL_S` | `120` | Redis TTL for the discovered set. | ### Database | Var | Default | Notes | |---|---|---| | `DATABASE_URL` | `postgresql+asyncpg://…` | asyncpg driver. | | `DATABASE_POOL_SIZE` | `10` | | | `DATABASE_POOL_OVERFLOW` | `20` | | ### Redis | Var | Default | Notes | |---|---|---| | `REDIS_URL` | `redis://redis:6379/0` | | | `REDIS_KEY_CACHE_TTL_S` | `60` | Resolved-key cache TTL. | ### Limits (defaults; per-tenant/key DB overrides win) | Var | Default | Notes | |---|---|---| | `DEFAULT_RPM` | `60` | | | `DEFAULT_TPM` | `100000` | | | `DEFAULT_CONCURRENT` | `8` | | | `MAX_REQUEST_BODY_BYTES` | `262144` | 256 KiB request cap. | | `MAX_NUM_PREDICT` | `4096` | Hard cap on requested completion tokens. | ### Security | Var | Default | Notes | |---|---|---| | `ARGON2_TIME_COST` | `3` | | | `ARGON2_MEMORY_COST_KIB` | `65536` | 64 MiB. | | `ARGON2_PARALLELISM` | `4` | | | `AUTH_FAILURE_RATE_LIMIT_PER_IP_PER_MIN` | `20` | Throttles auth brute-force per source IP. | ### Audit | Var | Default | Notes | |---|---|---| | `AUDIT_BUFFER_SIZE` | `1000` | Ring buffer; full ⇒ deny mode. | | `PROMPT_LOG_DEFAULT_RETENTION_DAYS` | `30` | | | `AUDIT_LOG_DEFAULT_RETENTION_DAYS` | `365` | | --- ## TLS & security headers In the canonical (jwilder-proxy) setup, TLS termination and security headers belong on the host's `nginx-proxy` container, not in this repo. Use the standard nginx-proxy custom-config mechanism (`/etc/nginx/vhost.d/api.neuronetz.ai`) to add HSTS and the rest: ``` add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always; add_header X-Content-Type-Options "nosniff" always; add_header X-Frame-Options "DENY" always; add_header Referrer-Policy "no-referrer" always; ``` If you prefer to terminate TLS in this repo (no jwilder-proxy on the host), see the section below. ## Alternative: TLS via Caddy sidecar `ops/caddy/Caddyfile.example` is provided for hosts without jwilder-proxy. It sets HSTS, the security headers above, strips the `Server` header, and obtains a Let's Encrypt cert. To use it, add a `caddy` service to your local copy of `docker-compose.yml` (binding host 80/443), drop the gateway's `VIRTUAL_HOST` / `LETSENCRYPT_HOST` env vars, and remove the `proxy` external-network requirement. The Caddyfile itself is self- documenting; edit the site address and ACME `email` before deploying. --- ## Non-Compose (systemd) A systemd unit is provided for hosts that run the image directly (`ops/systemd/`). The gateway still requires reachable Postgres, Redis, and Ollama, and the same environment variables. TLS in that topology is whatever fronts the host (Caddy, nginx, a load balancer) — **Ollama still must not be publicly reachable.** --- ## Upgrades & migrations The gateway runs `alembic upgrade head` on container start, so a normal `docker compose up -d --build` after pulling a new version applies pending migrations. For zero-downtime upgrades, run migrations as a one-off (`docker compose run --rm gateway alembic upgrade head`) before rolling the service.