27b7012ec9a24577b9a86f2e0f1fc738e9205a29
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
653e03bf29 |
proxy: multi-backend Ollama aggregation with per-model routing + failover
The gateway can now aggregate models across SEVERAL Ollama backends and route each request to the correct one. Opt-in via OLLAMA_BACKENDS in .env — single-backend deployments are unaffected (effective_backends() synthesizes a single "default" backend from the legacy OLLAMA_BASE_URL / OLLAMA_AUTH_TOKEN fields when the list is empty). Behavior: - Discovery polls EVERY configured backend in parallel each tick; the cache stores per-backend model lists plus a model → backends priority list (config order = priority order). - /api/tags and /v1/models surface the DEDUPLICATED UNION of all backends' models. - A request's model is looked up in the priority list and proxied to the FIRST backend that hosts it. If that backend errors on the request, the pipeline transparently fails over to the next backend that has the same model (the streaming-failover probes the first chunk before releasing the response, so we never serve partial bytes from a dead backend). - No existence disclosure: a model not hosted by any backend yields the same generic 403 as "model not allowed" (SPEC §13.6 preserved). Components: - config.py: new BackendSpec model + ollama_backends list field + an effective_backends() helper. - proxy/router.py (new): BackendRouter (clients_for_with_failover), build_http_clients() builds one httpx client per backend with its own auth headers, build_backend_headers() exposes the per-backend header composition for the CLI probe. - proxy/discovery.py: DiscoveryCache.set_per_backend() + backends_for(), refresh_all_backends() polls all in parallel, discovery_loop_multi() replaces the single-backend loop in production; the legacy single- backend functions are kept for the dependency-override tests. - proxy/pipeline.py: Pipeline accepts an optional router; the four proxy methods now retry against each candidate backend in priority order on transport error. - lifespan.py: constructs the per-backend client dict, stores the router on app.state, launches discovery_loop_multi. - deps.py: get_backend_router provider + BackendRouterDep type alias; get_pipeline passes the router into Pipeline. - cli/manage.py: probe-ollama iterates every backend and reports per- backend status; list-models groups its output by backend and prints the union count + Redis cache size for sanity. - .env.example + docker-compose.yml: document and pass through OLLAMA_BACKENDS with a real example. Verified: ruff check (clean), mypy --strict src/ + tests/ (clean, 66 source files), pytest (60 passed + 39 skipped — same baseline as before this change; integration tests are Docker-socket-gated). |
||
|
|
662fbfb442 |
deploy: upstream Ollama auth token + adoptable data volumes
Two production-hardening changes triggered by real issues found on the
first prod attempt against neuronetz-ai-01.
1. Upstream auth (the production Ollama is fronted by an auth proxy):
- New config: OLLAMA_AUTH_TOKEN (pydantic SecretStr — never appears in
repr/logs/errors), plus OLLAMA_AUTH_HEADER (default "Authorization")
and OLLAMA_AUTH_SCHEME (default "Bearer") for stacks that expect a
non-standard header like X-API-Key.
- lifespan._build_upstream_headers() injects the configured header into
the single shared httpx client used by both the proxy hot path AND
the discovery poller, so /api/tags + /api/chat both authenticate
against the upstream automatically.
- New CLI: `neuronetz-gateway probe-ollama` — uses the same client
config to GET /api/version and /api/tags, reports success/transport-
error/HTTP-status, lists the first few discovered models, exits 1 on
any failure. The token itself is never printed (only whether one
was attached). Lets ops verify upstream reachability before letting
real traffic through.
- docker-compose.yml passes OLLAMA_AUTH_TOKEN/HEADER/SCHEME through;
.env.example documents them with a leave-blank-for-internal-Ollama
default.
2. Volume adoption (don't lose existing model data on re-deploy):
- docker-compose.yml now pins absolute Docker volume NAMES for both
postgres_data and ollama_data, configurable via POSTGRES_DATA_VOLUME
and OLLAMA_DATA_VOLUME. Defaults preserve the previous per-project
names so existing deployments aren't disturbed.
- This addresses the scenario where deploying this compose under a new
project directory created fresh, empty volumes alongside an existing
`neuro-ollama_ollama-data` volume containing pre-pulled models (incl.
deepseek-r1:14b, qwen2.5:14b, gemma3:12b, ...). Setting
OLLAMA_DATA_VOLUME=neuro-ollama_ollama-data in .env tells the new
stack to mount the existing volume in place — no copy, no downtime.
- .env.example documents the override with the exact host's volume name
as an example.
Both changes are ruff + mypy --strict clean.
|
||
|
|
b2ec32c852 |
deploy: target jwilder-proxy production stack
Production deployment now matches the host setup that already runs neuronetz.ai / neuro-landing: the gateway sits behind the jwilder nginx-proxy + acme-companion already on the host, instead of bundling its own Caddy sidecar. - docker-compose.yml: drop the Caddy service entirely. The gateway joins an external `proxy` Docker network (the same one neuronetz-web / neuronetz-www use) and advertises itself with VIRTUAL_HOST / VIRTUAL_PORT / LETSENCRYPT_HOST / LETSENCRYPT_EMAIL. nginx-proxy routes TLS-terminated traffic to it on the shared network; acme-companion handles Let's Encrypt issuance + renewal for api.neuronetz.ai automatically. NO host ports are published in this compose file anywhere — gateway, postgres, redis, ollama all stay unreachable from the host. Pinned container_names (neuronetz-gateway / -postgres / -redis / -ollama) for stable identification by nginx-proxy and ops scripts. - .env.example: add GATEWAY_VIRTUAL_HOST + LETSENCRYPT_EMAIL; flip the default GATEWAY_TRUSTED_PROXIES to `127.0.0.1,nginx-proxy`. - docs/DEPLOYMENT.md: the canonical path is now jwilder-proxy. Reorganized prerequisites + steps around it; documented adding HSTS and the other security headers via the nginx-proxy custom-config mechanism (/etc/nginx/vhost.d/<host>). The Caddy sidecar lives on as a documented alternative for hosts without jwilder-proxy (ops/caddy/Caddyfile.example is kept). The Ollama-never-exposed non-negotiable is unchanged. |
||
|
|
d79f17b3bb |
scaffold: project skeleton, schema, healthz/readyz, CI
Initial project structure for neuronetz-gateway per scope-docs/SPEC.md: - Python 3.12 / FastAPI / SQLAlchemy 2.0 (async) / Redis / Postgres stack managed by uv. Multi-stage non-root Dockerfile, prod + dev compose files (ollama service is NEVER published in either), Caddyfile + systemd unit, justfile, GitHub Actions CI (ruff, mypy --strict, pytest, bandit, pip-audit). - Pydantic-Settings config covering every env var from SPEC §7, including the MODEL_DISCOVERY_* keys for the dynamic-discovery feature (§4.6). - Alembic 0001_initial creates the full gateway schema (8 tables, 3 enums, notify_key_revoked() trigger), incl. allow_all_models on tenant_limits and key_limits for the per-tenant auto-grant toggle. - Working /healthz, /readyz (fail-closed when deps unreachable), and a Prometheus /metrics stub. Sanitizing error handlers that attach X-Request-ID to every response and never leak upstream internals. - SPEC + AGENT_PROMPT included under scope-docs/ (source of truth). |