Files
neuronetz-gateway/docker-compose.yml
Stephan Berbig 653e03bf29
Some checks failed
CI / ruff (push) Has been cancelled
CI / mypy --strict (push) Has been cancelled
CI / pytest (push) Has been cancelled
CI / bandit (push) Has been cancelled
CI / pip-audit (push) Has been cancelled
proxy: multi-backend Ollama aggregation with per-model routing + failover
The gateway can now aggregate models across SEVERAL Ollama backends and
route each request to the correct one. Opt-in via OLLAMA_BACKENDS in .env
— single-backend deployments are unaffected (effective_backends()
synthesizes a single "default" backend from the legacy OLLAMA_BASE_URL /
OLLAMA_AUTH_TOKEN fields when the list is empty).

Behavior:
- Discovery polls EVERY configured backend in parallel each tick; the
  cache stores per-backend model lists plus a model → backends priority
  list (config order = priority order).
- /api/tags and /v1/models surface the DEDUPLICATED UNION of all
  backends' models.
- A request's model is looked up in the priority list and proxied to the
  FIRST backend that hosts it. If that backend errors on the request, the
  pipeline transparently fails over to the next backend that has the
  same model (the streaming-failover probes the first chunk before
  releasing the response, so we never serve partial bytes from a dead
  backend).
- No existence disclosure: a model not hosted by any backend yields the
  same generic 403 as "model not allowed" (SPEC §13.6 preserved).

Components:
- config.py: new BackendSpec model + ollama_backends list field + an
  effective_backends() helper.
- proxy/router.py (new): BackendRouter (clients_for_with_failover),
  build_http_clients() builds one httpx client per backend with its own
  auth headers, build_backend_headers() exposes the per-backend header
  composition for the CLI probe.
- proxy/discovery.py: DiscoveryCache.set_per_backend() + backends_for(),
  refresh_all_backends() polls all in parallel, discovery_loop_multi()
  replaces the single-backend loop in production; the legacy single-
  backend functions are kept for the dependency-override tests.
- proxy/pipeline.py: Pipeline accepts an optional router; the four proxy
  methods now retry against each candidate backend in priority order on
  transport error.
- lifespan.py: constructs the per-backend client dict, stores the router
  on app.state, launches discovery_loop_multi.
- deps.py: get_backend_router provider + BackendRouterDep type alias;
  get_pipeline passes the router into Pipeline.
- cli/manage.py: probe-ollama iterates every backend and reports per-
  backend status; list-models groups its output by backend and prints
  the union count + Redis cache size for sanity.
- .env.example + docker-compose.yml: document and pass through
  OLLAMA_BACKENDS with a real example.

Verified: ruff check (clean), mypy --strict src/ + tests/ (clean,
66 source files), pytest (60 passed + 39 skipped — same baseline as
before this change; integration tests are Docker-socket-gated).
2026-05-27 22:30:26 +02:00

143 lines
5.8 KiB
YAML

services:
gateway:
build:
context: .
dockerfile: Dockerfile
container_name: neuronetz-gateway
restart: unless-stopped
# NOTE: deliberately NO `ports:` — the gateway is reached only via the
# jwilder nginx-proxy on the shared external `proxy` network.
expose:
- "8080"
environment:
# jwilder/nginx-proxy + acme-companion routing (matches neuro-landing).
VIRTUAL_HOST: ${GATEWAY_VIRTUAL_HOST:-api.neuronetz.ai}
VIRTUAL_PORT: "8080"
LETSENCRYPT_HOST: ${GATEWAY_VIRTUAL_HOST:-api.neuronetz.ai}
LETSENCRYPT_EMAIL: ${LETSENCRYPT_EMAIL:-admin@neuronetz.ai}
GATEWAY_BIND_HOST: 0.0.0.0
GATEWAY_BIND_PORT: "8080"
GATEWAY_LOG_LEVEL: ${GATEWAY_LOG_LEVEL:-INFO}
GATEWAY_LOG_FORMAT: ${GATEWAY_LOG_FORMAT:-json}
GATEWAY_REQUEST_ID_HEADER: ${GATEWAY_REQUEST_ID_HEADER:-X-Request-ID}
# nginx-proxy forwards from the `proxy` network — trust its IP space.
GATEWAY_TRUSTED_PROXIES: ${GATEWAY_TRUSTED_PROXIES:-127.0.0.1,nginx-proxy}
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-gateway}:${POSTGRES_PASSWORD:-changeme}@postgres:5432/${POSTGRES_DB:-neuronetz}
DATABASE_POOL_SIZE: ${DATABASE_POOL_SIZE:-10}
DATABASE_POOL_OVERFLOW: ${DATABASE_POOL_OVERFLOW:-20}
REDIS_URL: redis://redis:6379/0
REDIS_KEY_CACHE_TTL_S: ${REDIS_KEY_CACHE_TTL_S:-60}
OLLAMA_BASE_URL: ${OLLAMA_BASE_URL:-http://ollama:11434}
OLLAMA_CONNECT_TIMEOUT_S: ${OLLAMA_CONNECT_TIMEOUT_S:-5}
OLLAMA_READ_TIMEOUT_S: ${OLLAMA_READ_TIMEOUT_S:-600}
OLLAMA_MAX_CONNECTIONS: ${OLLAMA_MAX_CONNECTIONS:-64}
# Optional Bearer token for an externally-fronted Ollama (default empty:
# the in-stack ollama service needs no auth on the private network).
OLLAMA_AUTH_TOKEN: ${OLLAMA_AUTH_TOKEN:-}
OLLAMA_AUTH_HEADER: ${OLLAMA_AUTH_HEADER:-Authorization}
OLLAMA_AUTH_SCHEME: ${OLLAMA_AUTH_SCHEME:-Bearer}
# Multi-backend (opt-in JSON list). See .env.example for the schema.
OLLAMA_BACKENDS: ${OLLAMA_BACKENDS:-}
MODEL_DISCOVERY_REFRESH_S: ${MODEL_DISCOVERY_REFRESH_S:-60}
MODEL_DISCOVERY_CACHE_TTL_S: ${MODEL_DISCOVERY_CACHE_TTL_S:-120}
DEFAULT_RPM: ${DEFAULT_RPM:-60}
DEFAULT_TPM: ${DEFAULT_TPM:-100000}
DEFAULT_CONCURRENT: ${DEFAULT_CONCURRENT:-8}
MAX_REQUEST_BODY_BYTES: ${MAX_REQUEST_BODY_BYTES:-262144}
MAX_NUM_PREDICT: ${MAX_NUM_PREDICT:-4096}
ARGON2_TIME_COST: ${ARGON2_TIME_COST:-3}
ARGON2_MEMORY_COST_KIB: ${ARGON2_MEMORY_COST_KIB:-65536}
ARGON2_PARALLELISM: ${ARGON2_PARALLELISM:-4}
AUTH_FAILURE_RATE_LIMIT_PER_IP_PER_MIN: ${AUTH_FAILURE_RATE_LIMIT_PER_IP_PER_MIN:-20}
AUDIT_BUFFER_SIZE: ${AUDIT_BUFFER_SIZE:-1000}
PROMPT_LOG_DEFAULT_RETENTION_DAYS: ${PROMPT_LOG_DEFAULT_RETENTION_DAYS:-30}
AUDIT_LOG_DEFAULT_RETENTION_DAYS: ${AUDIT_LOG_DEFAULT_RETENTION_DAYS:-365}
# Playground + auto-docs OFF by default in prod.
PLAYGROUND_ENABLED: ${PLAYGROUND_ENABLED:-false}
DOCS_ENABLED: ${DOCS_ENABLED:-false}
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
ollama:
condition: service_started
# Apply migrations, then start the server.
command: ["sh", "-c", "alembic upgrade head && exec python -m neuronetz_gateway"]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://127.0.0.1:8080/healthz"]
interval: 15s
timeout: 3s
retries: 5
start_period: 30s
networks:
- proxy # for nginx-proxy / acme-companion (TLS-fronted public traffic)
- internal # for postgres / redis / ollama (private)
postgres:
image: postgres:16-alpine
container_name: neuronetz-postgres
restart: unless-stopped
environment:
POSTGRES_USER: ${POSTGRES_USER:-gateway}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
POSTGRES_DB: ${POSTGRES_DB:-neuronetz}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-gateway} -d ${POSTGRES_DB:-neuronetz}"]
interval: 5s
timeout: 3s
retries: 10
networks:
- internal
redis:
image: redis:7-alpine
container_name: neuronetz-redis
restart: unless-stopped
command: ["redis-server", "--save", "", "--appendonly", "no"]
# No `ports:` — Redis is internal-only.
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 10
networks:
- internal
ollama:
image: ollama/ollama:latest
container_name: neuronetz-ollama
restart: unless-stopped
volumes:
- ollama_data:/root/.ollama
networks:
- internal
networks:
# External network managed by the host's jwilder-proxy stack
# (the same network neuronetz-web / neuronetz-www are attached to).
proxy:
external: true
# Private network for inter-service traffic; not reachable from the host.
internal:
driver: bridge
volumes:
# Pin absolute volume NAMES so the stack can ADOPT an existing volume that was
# created by a previous deployment under a different compose project. Without
# an explicit `name:`, compose namespaces volumes by project (directory) name,
# so a rename or re-clone silently creates fresh, empty volumes alongside the
# old data. We hit that the first time this stack was deployed (the original
# models lived in `neuro-ollama_ollama-data` and a fresh `neuro-gateway_
# ollama_data` was created next to them, leaving the models orphaned).
#
# Override via .env if your existing volumes are named differently:
# POSTGRES_DATA_VOLUME=neuro-api_postgres-data
# OLLAMA_DATA_VOLUME=neuro-ollama_ollama-data
postgres_data:
name: ${POSTGRES_DATA_VOLUME:-neuro-gateway_postgres_data}
ollama_data:
name: ${OLLAMA_DATA_VOLUME:-neuro-gateway_ollama_data}