Two production-hardening changes triggered by real issues found on the
first prod attempt against neuronetz-ai-01.
1. Upstream auth (the production Ollama is fronted by an auth proxy):
- New config: OLLAMA_AUTH_TOKEN (pydantic SecretStr — never appears in
repr/logs/errors), plus OLLAMA_AUTH_HEADER (default "Authorization")
and OLLAMA_AUTH_SCHEME (default "Bearer") for stacks that expect a
non-standard header like X-API-Key.
- lifespan._build_upstream_headers() injects the configured header into
the single shared httpx client used by both the proxy hot path AND
the discovery poller, so /api/tags + /api/chat both authenticate
against the upstream automatically.
- New CLI: `neuronetz-gateway probe-ollama` — uses the same client
config to GET /api/version and /api/tags, reports success/transport-
error/HTTP-status, lists the first few discovered models, exits 1 on
any failure. The token itself is never printed (only whether one
was attached). Lets ops verify upstream reachability before letting
real traffic through.
- docker-compose.yml passes OLLAMA_AUTH_TOKEN/HEADER/SCHEME through;
.env.example documents them with a leave-blank-for-internal-Ollama
default.
2. Volume adoption (don't lose existing model data on re-deploy):
- docker-compose.yml now pins absolute Docker volume NAMES for both
postgres_data and ollama_data, configurable via POSTGRES_DATA_VOLUME
and OLLAMA_DATA_VOLUME. Defaults preserve the previous per-project
names so existing deployments aren't disturbed.
- This addresses the scenario where deploying this compose under a new
project directory created fresh, empty volumes alongside an existing
`neuro-ollama_ollama-data` volume containing pre-pulled models (incl.
deepseek-r1:14b, qwen2.5:14b, gemma3:12b, ...). Setting
OLLAMA_DATA_VOLUME=neuro-ollama_ollama-data in .env tells the new
stack to mount the existing volume in place — no copy, no downtime.
- .env.example documents the override with the exact host's volume name
as an example.
Both changes are ruff + mypy --strict clean.
Production deployment now matches the host setup that already runs
neuronetz.ai / neuro-landing: the gateway sits behind the jwilder
nginx-proxy + acme-companion already on the host, instead of bundling
its own Caddy sidecar.
- docker-compose.yml: drop the Caddy service entirely. The gateway joins
an external `proxy` Docker network (the same one neuronetz-web /
neuronetz-www use) and advertises itself with VIRTUAL_HOST /
VIRTUAL_PORT / LETSENCRYPT_HOST / LETSENCRYPT_EMAIL. nginx-proxy
routes TLS-terminated traffic to it on the shared network;
acme-companion handles Let's Encrypt issuance + renewal for
api.neuronetz.ai automatically. NO host ports are published in this
compose file anywhere — gateway, postgres, redis, ollama all stay
unreachable from the host. Pinned container_names
(neuronetz-gateway / -postgres / -redis / -ollama) for stable
identification by nginx-proxy and ops scripts.
- .env.example: add GATEWAY_VIRTUAL_HOST + LETSENCRYPT_EMAIL; flip the
default GATEWAY_TRUSTED_PROXIES to `127.0.0.1,nginx-proxy`.
- docs/DEPLOYMENT.md: the canonical path is now jwilder-proxy.
Reorganized prerequisites + steps around it; documented adding HSTS
and the other security headers via the nginx-proxy custom-config
mechanism (/etc/nginx/vhost.d/<host>). The Caddy sidecar lives on as
a documented alternative for hosts without jwilder-proxy
(ops/caddy/Caddyfile.example is kept).
The Ollama-never-exposed non-negotiable is unchanged.
Initial project structure for neuronetz-gateway per scope-docs/SPEC.md:
- Python 3.12 / FastAPI / SQLAlchemy 2.0 (async) / Redis / Postgres stack
managed by uv. Multi-stage non-root Dockerfile, prod + dev compose files
(ollama service is NEVER published in either), Caddyfile + systemd unit,
justfile, GitHub Actions CI (ruff, mypy --strict, pytest, bandit, pip-audit).
- Pydantic-Settings config covering every env var from SPEC §7, including the
MODEL_DISCOVERY_* keys for the dynamic-discovery feature (§4.6).
- Alembic 0001_initial creates the full gateway schema (8 tables, 3 enums,
notify_key_revoked() trigger), incl. allow_all_models on tenant_limits and
key_limits for the per-tenant auto-grant toggle.
- Working /healthz, /readyz (fail-closed when deps unreachable), and a
Prometheus /metrics stub. Sanitizing error handlers that attach X-Request-ID
to every response and never leak upstream internals.
- SPEC + AGENT_PROMPT included under scope-docs/ (source of truth).