Commit Graph

5 Commits

Author SHA1 Message Date
Stephan Berbig
64f1ebc484 cli: graceful fallback when --write-env can't write the host-mounted .env
Some checks failed
CI / ruff (push) Has been cancelled
CI / mypy --strict (push) Has been cancelled
CI / pytest (push) Has been cancelled
CI / bandit (push) Has been cancelled
CI / pip-audit (push) Has been cancelled
The CLI runs inside the gateway container as the non-root `gateway` user
(uid 10001). The .env file is typically host-mounted at /app/.env, owned
by the host user — so the container process can't write the
.env.tmp file `update_env_file()` creates for the atomic rename. That
surfaced as a raw PermissionError traceback from
`docker exec neuronetz-gateway neuronetz-gateway remove-backend …`.

Now both add-backend and remove-backend catch PermissionError / OSError
on the write, print a tidy red message, and tell the user the two ways
forward: re-run with `docker exec -u root …` (in-container root keeps
the same FS, just bypasses ownership), or paste the printed
OLLAMA_BACKENDS line manually. Exit 1 either way so scripts notice.
2026-05-27 23:07:48 +02:00
Stephan Berbig
c9e11c3486 cli: add add-backend / remove-backend / list-backends commands
Some checks failed
CI / ruff (push) Has been cancelled
CI / mypy --strict (push) Has been cancelled
CI / pytest (push) Has been cancelled
CI / bandit (push) Has been cancelled
CI / pip-audit (push) Has been cancelled
So nobody ever has to hand-write the OLLAMA_BACKENDS JSON again.

  # add a backend, probe it, print the resulting .env line:
  neuronetz-gateway add-backend embedded     http://ollama:11434
  neuronetz-gateway add-backend neuro-ollama http://neuro-ollama:11434 --token ABC

  # update one (e.g. rotate token):
  neuronetz-gateway add-backend neuro-ollama http://neuro-ollama:11434 --token XYZ --replace

  # remove:
  neuronetz-gateway remove-backend neuro-ollama

  # peek (tokens redacted):
  neuronetz-gateway list-backends

  # write directly to a .env file (atomic temp-file + rename):
  neuronetz-gateway add-backend foo http://foo:11434 --token T --write-env /app/.env

  # show what would change without doing it:
  neuronetz-gateway add-backend foo http://foo:11434 --token T --dry-run

What each command does:

- `add-backend NAME URL` (+ optional --token / --header / --scheme / --replace
  / --no-validate / --write-env / --dry-run): builds a new backend list (current
  list parsed from OLLAMA_BACKENDS env, or synthesized from the single-backend
  fallback if unset), validates the new backend by probing /api/tags with the
  same headers the gateway will use at runtime (`build_backend_headers`), then
  prints the resulting OLLAMA_BACKENDS=... line ready to paste — or writes it
  in place if --write-env is given. Refuses to overwrite an existing name
  unless --replace is passed.
- `remove-backend NAME` (+ --write-env / --dry-run): mirror of add-backend for
  removal.
- `list-backends`: shows the configured backends with tokens redacted to
  "***" via `redacted_dump`. Useful sanity check after editing .env.

All the JSON manipulation is in a new pure-helpers module
`cli/backends.py` (parse / serialize / add_or_replace / remove /
update_env_file). The Typer commands in `cli/manage.py` are thin shells
on top — the logic is unit-tested directly without spinning up Typer or
the network. The token is unwrapped from SecretStr exactly once at the
serialization boundary (`to_dict`) and never logged.

New tests (16): full coverage of the helpers — round-trip serialize/parse,
duplicate-name rejection, replace-in-place order preservation, remove
on unknown name, redaction, atomic env-file rewrite (insert / replace /
idempotent re-apply / create-when-missing).

ruff (incl. the per-file ignore add for tests' S105/S106 — placeholder
"tok123"-style strings are inputs, not credentials) + mypy --strict (68
source files) clean. pytest: 76 passed + 39 skipped (the 16 new tests +
no regressions on the existing 60).
2026-05-27 22:59:53 +02:00
Stephan Berbig
653e03bf29 proxy: multi-backend Ollama aggregation with per-model routing + failover
Some checks failed
CI / ruff (push) Has been cancelled
CI / mypy --strict (push) Has been cancelled
CI / pytest (push) Has been cancelled
CI / bandit (push) Has been cancelled
CI / pip-audit (push) Has been cancelled
The gateway can now aggregate models across SEVERAL Ollama backends and
route each request to the correct one. Opt-in via OLLAMA_BACKENDS in .env
— single-backend deployments are unaffected (effective_backends()
synthesizes a single "default" backend from the legacy OLLAMA_BASE_URL /
OLLAMA_AUTH_TOKEN fields when the list is empty).

Behavior:
- Discovery polls EVERY configured backend in parallel each tick; the
  cache stores per-backend model lists plus a model → backends priority
  list (config order = priority order).
- /api/tags and /v1/models surface the DEDUPLICATED UNION of all
  backends' models.
- A request's model is looked up in the priority list and proxied to the
  FIRST backend that hosts it. If that backend errors on the request, the
  pipeline transparently fails over to the next backend that has the
  same model (the streaming-failover probes the first chunk before
  releasing the response, so we never serve partial bytes from a dead
  backend).
- No existence disclosure: a model not hosted by any backend yields the
  same generic 403 as "model not allowed" (SPEC §13.6 preserved).

Components:
- config.py: new BackendSpec model + ollama_backends list field + an
  effective_backends() helper.
- proxy/router.py (new): BackendRouter (clients_for_with_failover),
  build_http_clients() builds one httpx client per backend with its own
  auth headers, build_backend_headers() exposes the per-backend header
  composition for the CLI probe.
- proxy/discovery.py: DiscoveryCache.set_per_backend() + backends_for(),
  refresh_all_backends() polls all in parallel, discovery_loop_multi()
  replaces the single-backend loop in production; the legacy single-
  backend functions are kept for the dependency-override tests.
- proxy/pipeline.py: Pipeline accepts an optional router; the four proxy
  methods now retry against each candidate backend in priority order on
  transport error.
- lifespan.py: constructs the per-backend client dict, stores the router
  on app.state, launches discovery_loop_multi.
- deps.py: get_backend_router provider + BackendRouterDep type alias;
  get_pipeline passes the router into Pipeline.
- cli/manage.py: probe-ollama iterates every backend and reports per-
  backend status; list-models groups its output by backend and prints
  the union count + Redis cache size for sanity.
- .env.example + docker-compose.yml: document and pass through
  OLLAMA_BACKENDS with a real example.

Verified: ruff check (clean), mypy --strict src/ + tests/ (clean,
66 source files), pytest (60 passed + 39 skipped — same baseline as
before this change; integration tests are Docker-socket-gated).
2026-05-27 22:30:26 +02:00
Stephan Berbig
662fbfb442 deploy: upstream Ollama auth token + adoptable data volumes
Some checks failed
CI / ruff (push) Has been cancelled
CI / mypy --strict (push) Has been cancelled
CI / pytest (push) Has been cancelled
CI / bandit (push) Has been cancelled
CI / pip-audit (push) Has been cancelled
Two production-hardening changes triggered by real issues found on the
first prod attempt against neuronetz-ai-01.

1. Upstream auth (the production Ollama is fronted by an auth proxy):

   - New config: OLLAMA_AUTH_TOKEN (pydantic SecretStr — never appears in
     repr/logs/errors), plus OLLAMA_AUTH_HEADER (default "Authorization")
     and OLLAMA_AUTH_SCHEME (default "Bearer") for stacks that expect a
     non-standard header like X-API-Key.
   - lifespan._build_upstream_headers() injects the configured header into
     the single shared httpx client used by both the proxy hot path AND
     the discovery poller, so /api/tags + /api/chat both authenticate
     against the upstream automatically.
   - New CLI: `neuronetz-gateway probe-ollama` — uses the same client
     config to GET /api/version and /api/tags, reports success/transport-
     error/HTTP-status, lists the first few discovered models, exits 1 on
     any failure. The token itself is never printed (only whether one
     was attached). Lets ops verify upstream reachability before letting
     real traffic through.
   - docker-compose.yml passes OLLAMA_AUTH_TOKEN/HEADER/SCHEME through;
     .env.example documents them with a leave-blank-for-internal-Ollama
     default.

2. Volume adoption (don't lose existing model data on re-deploy):

   - docker-compose.yml now pins absolute Docker volume NAMES for both
     postgres_data and ollama_data, configurable via POSTGRES_DATA_VOLUME
     and OLLAMA_DATA_VOLUME. Defaults preserve the previous per-project
     names so existing deployments aren't disturbed.
   - This addresses the scenario where deploying this compose under a new
     project directory created fresh, empty volumes alongside an existing
     `neuro-ollama_ollama-data` volume containing pre-pulled models (incl.
     deepseek-r1:14b, qwen2.5:14b, gemma3:12b, ...). Setting
     OLLAMA_DATA_VOLUME=neuro-ollama_ollama-data in .env tells the new
     stack to mount the existing volume in place — no copy, no downtime.
   - .env.example documents the override with the exact host's volume name
     as an example.

Both changes are ruff + mypy --strict clean.
2026-05-27 18:59:09 +02:00
Stephan Berbig
6431b2f72c auth + cli: argon2id keys, bearer middleware, bootstrap commands
- argon2id hash/verify/needs_rehash; constant-time path; parameters from config.
- Key format nz_<prefix><secret> (12-char stored prefix incl. nz_, 32-char
  random secret); the full key is generated with secrets, hashed argon2id, and
  printed exactly once at creation — never persisted, never logged.
- Bearer auth middleware: extract → resolve prefix → Redis cache (TTL from
  REDIS_KEY_CACHE_TTL_S) → DB → argon2 verify → cache the resolved Principal.
  Fail-closed; uniform sanitized 401 with X-Request-ID; per-IP auth-failure
  counter to slow brute force. Exempt paths: /healthz /readyz /metrics /, and
  /playground when enabled.
- Bootstrap CLI (Typer) per SPEC §11: create-tenant (with --allow-all-models),
  create-key, list-keys, revoke-key, set-budget, set-models (--models or
  --allow-all / --no-allow-all), show-usage, list-models.
- Async repositories for tenants, api_keys, key_limits, budget_usage,
  revocations, audit_log — including the join+inheritance flatten that
  produces a Principal with effective rpm/tpm/concurrent/allowed_models/
  allow_all_models for the auth cache.
2026-05-26 20:52:33 +02:00