deploy: upstream Ollama auth token + adoptable data volumes
Two production-hardening changes triggered by real issues found on the
first prod attempt against neuronetz-ai-01.
1. Upstream auth (the production Ollama is fronted by an auth proxy):
- New config: OLLAMA_AUTH_TOKEN (pydantic SecretStr — never appears in
repr/logs/errors), plus OLLAMA_AUTH_HEADER (default "Authorization")
and OLLAMA_AUTH_SCHEME (default "Bearer") for stacks that expect a
non-standard header like X-API-Key.
- lifespan._build_upstream_headers() injects the configured header into
the single shared httpx client used by both the proxy hot path AND
the discovery poller, so /api/tags + /api/chat both authenticate
against the upstream automatically.
- New CLI: `neuronetz-gateway probe-ollama` — uses the same client
config to GET /api/version and /api/tags, reports success/transport-
error/HTTP-status, lists the first few discovered models, exits 1 on
any failure. The token itself is never printed (only whether one
was attached). Lets ops verify upstream reachability before letting
real traffic through.
- docker-compose.yml passes OLLAMA_AUTH_TOKEN/HEADER/SCHEME through;
.env.example documents them with a leave-blank-for-internal-Ollama
default.
2. Volume adoption (don't lose existing model data on re-deploy):
- docker-compose.yml now pins absolute Docker volume NAMES for both
postgres_data and ollama_data, configurable via POSTGRES_DATA_VOLUME
and OLLAMA_DATA_VOLUME. Defaults preserve the previous per-project
names so existing deployments aren't disturbed.
- This addresses the scenario where deploying this compose under a new
project directory created fresh, empty volumes alongside an existing
`neuro-ollama_ollama-data` volume containing pre-pulled models (incl.
deepseek-r1:14b, qwen2.5:14b, gemma3:12b, ...). Setting
OLLAMA_DATA_VOLUME=neuro-ollama_ollama-data in .env tells the new
stack to mount the existing volume in place — no copy, no downtime.
- .env.example documents the override with the exact host's volume name
as an example.
Both changes are ruff + mypy --strict clean.
This commit is contained in:
@@ -32,6 +32,26 @@ if TYPE_CHECKING:
|
||||
_log = get_logger("lifespan")
|
||||
|
||||
|
||||
def _build_upstream_headers(settings: Settings) -> dict[str, str]:
|
||||
"""Compose default headers for the upstream Ollama client.
|
||||
|
||||
If ``OLLAMA_AUTH_TOKEN`` is set, attach the configured auth header. The
|
||||
scheme prefix (``Bearer``) is included only when the header is the standard
|
||||
``Authorization``; for custom headers like ``X-API-Key`` the raw token is
|
||||
sent. The SecretStr is unwrapped only here, never logged.
|
||||
"""
|
||||
headers: dict[str, str] = {"User-Agent": "neuronetz-gateway"}
|
||||
if settings.ollama_auth_token is not None:
|
||||
raw = settings.ollama_auth_token.get_secret_value().strip()
|
||||
if raw:
|
||||
header = settings.ollama_auth_header
|
||||
if header.lower() == "authorization":
|
||||
headers[header] = f"{settings.ollama_auth_scheme} {raw}".strip()
|
||||
else:
|
||||
headers[header] = raw
|
||||
return headers
|
||||
|
||||
|
||||
def _build_http_client(settings: Settings) -> httpx.AsyncClient:
|
||||
"""Construct the shared httpx client used to reach Ollama."""
|
||||
timeout = httpx.Timeout(
|
||||
@@ -41,7 +61,12 @@ def _build_http_client(settings: Settings) -> httpx.AsyncClient:
|
||||
pool=settings.ollama_connect_timeout_s,
|
||||
)
|
||||
limits = httpx.Limits(max_connections=settings.ollama_max_connections)
|
||||
return httpx.AsyncClient(base_url=settings.ollama_base_url, timeout=timeout, limits=limits)
|
||||
return httpx.AsyncClient(
|
||||
base_url=settings.ollama_base_url,
|
||||
timeout=timeout,
|
||||
limits=limits,
|
||||
headers=_build_upstream_headers(settings),
|
||||
)
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
|
||||
Reference in New Issue
Block a user