# neuronetz-gateway — Deployment Production deployment is a single Docker Compose stack: **Caddy + gateway + Postgres + Redis + Ollama**. Caddy is the only public-facing component; it terminates TLS via Let's Encrypt for `api.neuronetz.ai` and reverse-proxies to the internal-only gateway. > For the local, no-GPU demo (mock Ollama + playground), see [`PLAYGROUND.md`](PLAYGROUND.md) > and run `./demo.sh`. This document is the **production** path. --- ## The one rule that must never break > ## ⛔ Ollama is NEVER exposed to the host or the internet. > > The `ollama` service in `docker-compose.yml` has **no `ports:` mapping** and must never > get one. Ollama is reachable only on the internal Docker network as `ollama:11434`. > Publishing it would re-open the exact unauthenticated exposure this whole project exists > to close (SPEC §1, §3; AGENT_PROMPT non-negotiable #2). The same posture applies to **Postgres** and **Redis** in the production compose file — no published ports. Only **Caddy** binds host ports (80/443, 443/udp for HTTP/3). --- ## Prerequisites - A host with Docker + Docker Compose. - DNS: `api.neuronetz.ai` → the host's public IP (for Let's Encrypt). - Ports 80 and 443 reachable from the internet (ACME HTTP/TLS challenge + serving). --- ## Steps ```bash git clone neuronetz-gateway && cd neuronetz-gateway # 1. Configure. Copy the example env and change EVERY secret. cp .env.example .env # - POSTGRES_PASSWORD: a strong, unique value # - DATABASE_URL: must match the POSTGRES_* values # - GATEWAY_LOG_FORMAT=json for production # 2. Configure Caddy for your domain + ACME email. cp ops/caddy/Caddyfile.example ops/caddy/Caddyfile # then edit the site + email # (docker-compose.yml mounts Caddyfile.example by default; point it at your edited file # or edit in place.) # 3. Bring up the full stack. The gateway runs `alembic upgrade head`, then serves. docker compose up -d --build # 4. Bootstrap a tenant + key (CLI runs inside the gateway container). docker compose exec gateway neuronetz-gateway create-tenant --name acme --rpm 120 --tpm 200000 docker compose exec gateway neuronetz-gateway create-key --tenant acme --name prod-server-1 # ^ prints the full key ONCE — store it in your secret manager now. # 5. Smoke test (through Caddy / TLS). curl https://api.neuronetz.ai/healthz curl -N https://api.neuronetz.ai/v1/chat/completions \ -H "Authorization: Bearer nz_…" -H "Content-Type: application/json" \ -d '{"model":"llama3.1:8b","stream":true,"messages":[{"role":"user","content":"hi"}]}' ``` Caddy obtains and renews the certificate automatically. For local testing without a public domain, use the `localhost { tls internal … }` block documented in `Caddyfile.example` (trust Caddy's local CA or pass `-k` to curl). --- ## Pointing at a real Ollama backend The gateway reaches Ollama via `OLLAMA_BASE_URL`. In the bundled stack this is the in-stack `ollama` service: `OLLAMA_BASE_URL=http://ollama:11434`. To use an **existing/external** Ollama host instead: 1. Remove the `ollama` service from `docker-compose.yml` (or leave it; it just won't be used). 2. Set `OLLAMA_BASE_URL` to the backend address reachable from the gateway container, e.g. `http://10.0.0.5:11434` or an internal DNS name. 3. Ensure that backend is itself **not** exposed to the internet — the gateway is the only thing that should ever reach it. Use a private network / firewall rule, not a public port. 4. Pull the models you want available on that backend. They appear in tenants' effective sets automatically on the next discovery refresh (SPEC §4.6) — no gateway config change for `allow_all_models` tenants. Discovery polls `OLLAMA_BASE_URL/api/tags` every `MODEL_DISCOVERY_REFRESH_S` seconds. If the backend is unreachable, the discovered set is empty and requests **fail closed**. --- ## Environment reference (SPEC §7) All configuration is via environment variables, validated by Pydantic Settings on boot. Boot **fails loudly** on invalid config. See [`.env.example`](../.env.example) for a copyable file. ### Service | Var | Default | Notes | |---|---|---| | `GATEWAY_BIND_HOST` | `0.0.0.0` | Bind-all inside the container. | | `GATEWAY_BIND_PORT` | `8080` | Internal port; never published directly in prod. | | `GATEWAY_LOG_LEVEL` | `INFO` | | | `GATEWAY_LOG_FORMAT` | `json` | `json` in prod, `console` for local dev. | | `GATEWAY_REQUEST_ID_HEADER` | `X-Request-ID` | | | `GATEWAY_TRUSTED_PROXIES` | `127.0.0.1,caddy` | Sources trusted for `X-Forwarded-For`. | ### Upstream (Ollama) | Var | Default | Notes | |---|---|---| | `OLLAMA_BASE_URL` | `http://ollama:11434` | Internal address of the backend. | | `OLLAMA_CONNECT_TIMEOUT_S` | `5` | | | `OLLAMA_READ_TIMEOUT_S` | `600` | Long, for slow generations. | | `OLLAMA_MAX_CONNECTIONS` | `64` | httpx pool size. | ### Model discovery (§4.6) | Var | Default | Notes | |---|---|---| | `MODEL_DISCOVERY_REFRESH_S` | `60` | How often to re-query `/api/tags`. | | `MODEL_DISCOVERY_CACHE_TTL_S` | `120` | Redis TTL for the discovered set. | ### Database | Var | Default | Notes | |---|---|---| | `DATABASE_URL` | `postgresql+asyncpg://…` | asyncpg driver. | | `DATABASE_POOL_SIZE` | `10` | | | `DATABASE_POOL_OVERFLOW` | `20` | | ### Redis | Var | Default | Notes | |---|---|---| | `REDIS_URL` | `redis://redis:6379/0` | | | `REDIS_KEY_CACHE_TTL_S` | `60` | Resolved-key cache TTL. | ### Limits (defaults; per-tenant/key DB overrides win) | Var | Default | Notes | |---|---|---| | `DEFAULT_RPM` | `60` | | | `DEFAULT_TPM` | `100000` | | | `DEFAULT_CONCURRENT` | `8` | | | `MAX_REQUEST_BODY_BYTES` | `262144` | 256 KiB request cap. | | `MAX_NUM_PREDICT` | `4096` | Hard cap on requested completion tokens. | ### Security | Var | Default | Notes | |---|---|---| | `ARGON2_TIME_COST` | `3` | | | `ARGON2_MEMORY_COST_KIB` | `65536` | 64 MiB. | | `ARGON2_PARALLELISM` | `4` | | | `AUTH_FAILURE_RATE_LIMIT_PER_IP_PER_MIN` | `20` | Throttles auth brute-force per source IP. | ### Audit | Var | Default | Notes | |---|---|---| | `AUDIT_BUFFER_SIZE` | `1000` | Ring buffer; full ⇒ deny mode. | | `PROMPT_LOG_DEFAULT_RETENTION_DAYS` | `30` | | | `AUDIT_LOG_DEFAULT_RETENTION_DAYS` | `365` | | --- ## TLS & security headers (Caddy) `ops/caddy/Caddyfile.example` already sets: - **HSTS** `max-age=63072000; includeSubDomains; preload` - `X-Content-Type-Options: nosniff` - `X-Frame-Options: DENY` - `Referrer-Policy: no-referrer` - strips `Server` and `X-Powered-By` Edit the site address and ACME `email` before deploying. --- ## Non-Compose (systemd) A systemd unit is provided for hosts that run the image directly (`ops/systemd/`). The gateway still requires reachable Postgres, Redis, and Ollama, and the same environment variables. TLS in that topology is whatever fronts the host (Caddy, nginx, a load balancer) — **Ollama still must not be publicly reachable.** --- ## Upgrades & migrations The gateway runs `alembic upgrade head` on container start, so a normal `docker compose up -d --build` after pulling a new version applies pending migrations. For zero-downtime upgrades, run migrations as a one-off (`docker compose run --rm gateway alembic upgrade head`) before rolling the service.