m17hr1l/neuronetz-gateway

Fork 0

Files

Stephan Berbig b2ec32c852

CI / ruff (push) Has been cancelled

Details

CI / mypy --strict (push) Has been cancelled

Details

CI / pytest (push) Has been cancelled

Details

CI / bandit (push) Has been cancelled

Details

CI / pip-audit (push) Has been cancelled

Details

deploy: target jwilder-proxy production stack

Production deployment now matches the host setup that already runs
neuronetz.ai / neuro-landing: the gateway sits behind the jwilder
nginx-proxy + acme-companion already on the host, instead of bundling
its own Caddy sidecar.

- docker-compose.yml: drop the Caddy service entirely. The gateway joins
  an external `proxy` Docker network (the same one neuronetz-web /
  neuronetz-www use) and advertises itself with VIRTUAL_HOST /
  VIRTUAL_PORT / LETSENCRYPT_HOST / LETSENCRYPT_EMAIL. nginx-proxy
  routes TLS-terminated traffic to it on the shared network;
  acme-companion handles Let's Encrypt issuance + renewal for
  api.neuronetz.ai automatically. NO host ports are published in this
  compose file anywhere — gateway, postgres, redis, ollama all stay
  unreachable from the host. Pinned container_names
  (neuronetz-gateway / -postgres / -redis / -ollama) for stable
  identification by nginx-proxy and ops scripts.
- .env.example: add GATEWAY_VIRTUAL_HOST + LETSENCRYPT_EMAIL; flip the
  default GATEWAY_TRUSTED_PROXIES to `127.0.0.1,nginx-proxy`.
- docs/DEPLOYMENT.md: the canonical path is now jwilder-proxy.
  Reorganized prerequisites + steps around it; documented adding HSTS
  and the other security headers via the nginx-proxy custom-config
  mechanism (/etc/nginx/vhost.d/<host>). The Caddy sidecar lives on as
  a documented alternative for hosts without jwilder-proxy
  (ops/caddy/Caddyfile.example is kept).

The Ollama-never-exposed non-negotiable is unchanged.

2026-05-26 20:55:20 +02:00

9.5 KiB

Raw Blame History

neuronetz-gateway — Deployment

Production deployment is a Docker Compose stack — gateway + Postgres + Redis + Ollama — that sits behind the host's existing jwilder/nginx-proxy stack (the same one already serving neuronetz.ai / neuro-landing). Public traffic enters via nginx-proxy and acme-companion, which terminate TLS and obtain/renew the Let's Encrypt certificate for api.neuronetz.ai. The gateway joins the host's external proxy Docker network alongside the other public-facing containers and advertises itself with VIRTUAL_HOST / VIRTUAL_PORT. Postgres, Redis, and Ollama stay on a private internal network with no published ports.

▶ Don't have jwilder-proxy on the host? See § "Alternative: TLS via Caddy sidecar" — the ops/caddy/Caddyfile.example is shipped for that case.

For the local, no-GPU demo (mock Ollama + playground), see PLAYGROUND.md and run ./demo.sh. This document is the production path.

The one rule that must never break

⛔ Ollama is NEVER exposed to the host or the internet.

The ollama service in docker-compose.yml has no ports: mapping and must never get one. Ollama is reachable only on the internal Docker network as ollama:11434. Publishing it would re-open the exact unauthenticated exposure this whole project exists to close (SPEC §1, §3; AGENT_PROMPT non-negotiable #2).

The same posture applies to Postgres, Redis, and the gateway itself in the production compose file — no published ports anywhere in this compose file. Only the host's jwilder nginx-proxy container binds 80/443; the gateway is reached via the shared external proxy Docker network.

Prerequisites

A host with Docker + Docker Compose.
A jwilder-proxy stack already running on the host, attached to an external Docker network named proxy. Typically jwilder/nginx-proxy + nginxproxy/acme-companion, the same setup serving neuronetz.ai / neuro-landing.
DNS: api.neuronetz.ai → the host's public IP.
Ports 80 and 443 already published by the jwilder-proxy container on that host (for ACME HTTP-01 + serving). This compose file does not publish them itself.

Steps (production — jwilder-proxy)

git clone ssh://git@gitea.neuronetz.ai:222/m17hr1l/neuronetz-gateway.git
cd neuronetz-gateway

# 1. Configure. Copy the example env and change EVERY secret.
cp .env.example .env
#   - POSTGRES_PASSWORD          : a strong, unique value
#   - GATEWAY_VIRTUAL_HOST       : api.neuronetz.ai  (read by nginx-proxy)
#   - LETSENCRYPT_EMAIL          : admin@neuronetz.ai  (read by acme-companion)
#   - GATEWAY_LOG_FORMAT=json    : for production
#   - GATEWAY_TRUSTED_PROXIES    : 127.0.0.1,nginx-proxy

# 2. Bring up the stack. The gateway joins the external `proxy` network and
#    runs `alembic upgrade head` before serving.
docker compose up -d --build
#   nginx-proxy observes the new container, generates an nginx vhost for
#   api.neuronetz.ai, and acme-companion issues the cert via Let's Encrypt.
#   Cert renewals are automatic.

# 3. Bootstrap a tenant + key (CLI runs inside the gateway container).
docker compose exec gateway neuronetz-gateway create-tenant --name acme --rpm 120 --tpm 200000
docker compose exec gateway neuronetz-gateway create-key --tenant acme --name prod-server-1
#   ^ prints the full key ONCE — store it in your secret manager now.

# 4. Smoke test through public TLS.
curl https://api.neuronetz.ai/healthz
curl -N https://api.neuronetz.ai/v1/chat/completions \
  -H "Authorization: Bearer nz_…" -H "Content-Type: application/json" \
  -d '{"model":"llama3.1:8b","stream":true,"messages":[{"role":"user","content":"hi"}]}'

The compose file pins container_name: neuronetz-gateway (and neuronetz-postgres / neuronetz-redis / neuronetz-ollama) for stable identification by nginx-proxy and for ops scripts.

Pointing at a real Ollama backend

The gateway reaches Ollama via OLLAMA_BASE_URL. In the bundled stack this is the in-stack ollama service: OLLAMA_BASE_URL=http://ollama:11434.

To use an existing/external Ollama host instead:

Remove the ollama service from docker-compose.yml (or leave it; it just won't be used).
Set OLLAMA_BASE_URL to the backend address reachable from the gateway container, e.g. http://10.0.0.5:11434 or an internal DNS name.
Ensure that backend is itself not exposed to the internet — the gateway is the only thing that should ever reach it. Use a private network / firewall rule, not a public port.
Pull the models you want available on that backend. They appear in tenants' effective sets automatically on the next discovery refresh (SPEC §4.6) — no gateway config change for allow_all_models tenants.

Discovery polls OLLAMA_BASE_URL/api/tags every MODEL_DISCOVERY_REFRESH_S seconds. If the backend is unreachable, the discovered set is empty and requests fail closed.

Environment reference (SPEC §7)

All configuration is via environment variables, validated by Pydantic Settings on boot. Boot fails loudly on invalid config. See .env.example for a copyable file.

Service

Var	Default	Notes
`GATEWAY_BIND_HOST`	`0.0.0.0`	Bind-all inside the container.
`GATEWAY_BIND_PORT`	`8080`	Internal port; never published directly in prod.
`GATEWAY_LOG_LEVEL`	`INFO`
`GATEWAY_LOG_FORMAT`	`json`	`json` in prod, `console` for local dev.
`GATEWAY_REQUEST_ID_HEADER`	`X-Request-ID`
`GATEWAY_TRUSTED_PROXIES`	`127.0.0.1,nginx-proxy`	Sources trusted for `X-Forwarded-For`. Set to your front-proxy's container name / IP.
`GATEWAY_VIRTUAL_HOST`	`api.neuronetz.ai`	Read by jwilder `nginx-proxy` and `acme-companion`.
`LETSENCRYPT_EMAIL`	`admin@neuronetz.ai`	Read by `acme-companion`.

Upstream (Ollama)

Var	Default	Notes
`OLLAMA_BASE_URL`	`http://ollama:11434`	Internal address of the backend.
`OLLAMA_CONNECT_TIMEOUT_S`	`5`
`OLLAMA_READ_TIMEOUT_S`	`600`	Long, for slow generations.
`OLLAMA_MAX_CONNECTIONS`	`64`	httpx pool size.

Model discovery (§4.6)

Var	Default	Notes
`MODEL_DISCOVERY_REFRESH_S`	`60`	How often to re-query `/api/tags`.
`MODEL_DISCOVERY_CACHE_TTL_S`	`120`	Redis TTL for the discovered set.

Database

Var	Default	Notes
`DATABASE_URL`	`postgresql+asyncpg://…`	asyncpg driver.
`DATABASE_POOL_SIZE`	`10`
`DATABASE_POOL_OVERFLOW`	`20`

Redis

Var	Default	Notes
`REDIS_URL`	`redis://redis:6379/0`
`REDIS_KEY_CACHE_TTL_S`	`60`	Resolved-key cache TTL.

Limits (defaults; per-tenant/key DB overrides win)

Var	Default	Notes
`DEFAULT_RPM`	`60`
`DEFAULT_TPM`	`100000`
`DEFAULT_CONCURRENT`	`8`
`MAX_REQUEST_BODY_BYTES`	`262144`	256 KiB request cap.
`MAX_NUM_PREDICT`	`4096`	Hard cap on requested completion tokens.

Security

Var	Default	Notes
`ARGON2_TIME_COST`	`3`
`ARGON2_MEMORY_COST_KIB`	`65536`	64 MiB.
`ARGON2_PARALLELISM`	`4`
`AUTH_FAILURE_RATE_LIMIT_PER_IP_PER_MIN`	`20`	Throttles auth brute-force per source IP.

Audit

Var	Default	Notes
`AUDIT_BUFFER_SIZE`	`1000`	Ring buffer; full ⇒ deny mode.
`PROMPT_LOG_DEFAULT_RETENTION_DAYS`	`30`
`AUDIT_LOG_DEFAULT_RETENTION_DAYS`	`365`

TLS & security headers

In the canonical (jwilder-proxy) setup, TLS termination and security headers belong on the host's nginx-proxy container, not in this repo. Use the standard nginx-proxy custom-config mechanism (/etc/nginx/vhost.d/api.neuronetz.ai) to add HSTS and the rest:

add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
add_header X-Content-Type-Options    "nosniff"                                       always;
add_header X-Frame-Options           "DENY"                                          always;
add_header Referrer-Policy           "no-referrer"                                   always;

If you prefer to terminate TLS in this repo (no jwilder-proxy on the host), see the section below.

Alternative: TLS via Caddy sidecar

ops/caddy/Caddyfile.example is provided for hosts without jwilder-proxy. It sets HSTS, the security headers above, strips the Server header, and obtains a Let's Encrypt cert. To use it, add a caddy service to your local copy of docker-compose.yml (binding host 80/443), drop the gateway's VIRTUAL_HOST / LETSENCRYPT_HOST env vars, and remove the proxy external-network requirement. The Caddyfile itself is self- documenting; edit the site address and ACME email before deploying.

Non-Compose (systemd)

A systemd unit is provided for hosts that run the image directly (ops/systemd/). The gateway still requires reachable Postgres, Redis, and Ollama, and the same environment variables. TLS in that topology is whatever fronts the host (Caddy, nginx, a load balancer) — Ollama still must not be publicly reachable.

Upgrades & migrations

The gateway runs alembic upgrade head on container start, so a normal docker compose up -d --build after pulling a new version applies pending migrations. For zero-downtime upgrades, run migrations as a one-off (docker compose run --rm gateway alembic upgrade head) before rolling the service.

9.5 KiB Raw Blame History

neuronetz-gateway — Deployment

The one rule that must never break

⛔ Ollama is NEVER exposed to the host or the internet.

Prerequisites

Steps (production — jwilder-proxy)

Pointing at a real Ollama backend

Environment reference (SPEC §7)

Service

Upstream (Ollama)

Model discovery (§4.6)

Database

Redis

Limits (defaults; per-tenant/key DB overrides win)

Security

Audit

TLS & security headers

Alternative: TLS via Caddy sidecar

Non-Compose (systemd)

Upgrades & migrations

9.5 KiB

Raw Blame History