deploy: upstream Ollama auth token + adoptable data volumes
Some checks failed
CI / ruff (push) Has been cancelled
CI / mypy --strict (push) Has been cancelled
CI / pytest (push) Has been cancelled
CI / bandit (push) Has been cancelled
CI / pip-audit (push) Has been cancelled

Two production-hardening changes triggered by real issues found on the
first prod attempt against neuronetz-ai-01.

1. Upstream auth (the production Ollama is fronted by an auth proxy):

   - New config: OLLAMA_AUTH_TOKEN (pydantic SecretStr — never appears in
     repr/logs/errors), plus OLLAMA_AUTH_HEADER (default "Authorization")
     and OLLAMA_AUTH_SCHEME (default "Bearer") for stacks that expect a
     non-standard header like X-API-Key.
   - lifespan._build_upstream_headers() injects the configured header into
     the single shared httpx client used by both the proxy hot path AND
     the discovery poller, so /api/tags + /api/chat both authenticate
     against the upstream automatically.
   - New CLI: `neuronetz-gateway probe-ollama` — uses the same client
     config to GET /api/version and /api/tags, reports success/transport-
     error/HTTP-status, lists the first few discovered models, exits 1 on
     any failure. The token itself is never printed (only whether one
     was attached). Lets ops verify upstream reachability before letting
     real traffic through.
   - docker-compose.yml passes OLLAMA_AUTH_TOKEN/HEADER/SCHEME through;
     .env.example documents them with a leave-blank-for-internal-Ollama
     default.

2. Volume adoption (don't lose existing model data on re-deploy):

   - docker-compose.yml now pins absolute Docker volume NAMES for both
     postgres_data and ollama_data, configurable via POSTGRES_DATA_VOLUME
     and OLLAMA_DATA_VOLUME. Defaults preserve the previous per-project
     names so existing deployments aren't disturbed.
   - This addresses the scenario where deploying this compose under a new
     project directory created fresh, empty volumes alongside an existing
     `neuro-ollama_ollama-data` volume containing pre-pulled models (incl.
     deepseek-r1:14b, qwen2.5:14b, gemma3:12b, ...). Setting
     OLLAMA_DATA_VOLUME=neuro-ollama_ollama-data in .env tells the new
     stack to mount the existing volume in place — no copy, no downtime.
   - .env.example documents the override with the exact host's volume name
     as an example.

Both changes are ruff + mypy --strict clean.
This commit is contained in:
Stephan Berbig
2026-05-27 18:59:09 +02:00
parent b2ec32c852
commit 662fbfb442
5 changed files with 162 additions and 3 deletions

View File

@@ -8,7 +8,7 @@ from __future__ import annotations
from functools import lru_cache
from pydantic import Field
from pydantic import Field, SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict
@@ -35,6 +35,16 @@ class Settings(BaseSettings):
ollama_connect_timeout_s: int = Field(default=5)
ollama_read_timeout_s: int = Field(default=600)
ollama_max_connections: int = Field(default=64)
# Optional Bearer token sent to the upstream Ollama on EVERY request from the
# gateway (proxy hot path + the discovery poller). Use SecretStr so the value
# never appears in repr(), logs, or error messages. Empty/unset = no header.
ollama_auth_token: SecretStr | None = Field(default=None)
# If you front Ollama with an auth proxy that expects a non-standard header
# name (e.g. ``X-API-Key`` instead of ``Authorization``), override here.
# The scheme prefix (``Bearer ``) is dropped automatically when the header
# isn't ``Authorization``.
ollama_auth_header: str = Field(default="Authorization")
ollama_auth_scheme: str = Field(default="Bearer")
# --- Model discovery (SPEC §4.6) ---
model_discovery_refresh_s: int = Field(default=60)