neuronetz-gateway

Author	SHA1	Message	Date
Stephan Berbig	27b7012ec9	scripts: wire-multi-backend.sh — one-shot multi-backend bootstrap Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details Drops the multi-step paste-by-paste setup down to one command. Detects every running ollama/ollama container on the host, classifies them (embedded vs extras based on which Docker network they share with the gateway), checks each one's auth config, rewrites OLLAMA_BACKENDS in .env atomically, recreates the gateway, and probes everything. What it does, in order: 1. preflight: docker on PATH, .env present, gateway container running 2. `docker ps --filter ancestor=ollama/ollama` → every running Ollama 3. for each one: - find the network it shares with the gateway; attach to `proxy` if it's on no shared network - read OLLAMA_AUTH and OLLAMA_AUTH_TOKEN from `docker inspect` - FAIL LOUD if auth=true with empty token (with the exact fix) - classify: on a `internal` net = the gateway's embedded backend; otherwise an extra 4. build the OLLAMA_BACKENDS JSON: embedded first (so it gets routing priority), then extras in discovery order, with per-backend tokens where present 5. write .env atomically (host-side temp + rename — no in-container perm issues to worry about) 6. `docker compose up -d gateway` to pick up the new env 7. wait for /healthz, then run probe-ollama and list-backends so the operator sees the end state immediately Token is redacted before echoing the resulting OLLAMA_BACKENDS line, so the script can run safely with terminal logging on. Idempotent: re-running produces the same OLLAMA_BACKENDS line. No half-states possible — every error path exits before .env is touched.	2026-05-27 23:15:59 +02:00
Stephan Berbig	64f1ebc484	cli: graceful fallback when --write-env can't write the host-mounted .env Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details The CLI runs inside the gateway container as the non-root `gateway` user (uid 10001). The .env file is typically host-mounted at /app/.env, owned by the host user — so the container process can't write the .env.tmp file `update_env_file()` creates for the atomic rename. That surfaced as a raw PermissionError traceback from `docker exec neuronetz-gateway neuronetz-gateway remove-backend …`. Now both add-backend and remove-backend catch PermissionError / OSError on the write, print a tidy red message, and tell the user the two ways forward: re-run with `docker exec -u root …` (in-container root keeps the same FS, just bypasses ownership), or paste the printed OLLAMA_BACKENDS line manually. Exit 1 either way so scripts notice.	2026-05-27 23:07:48 +02:00
Stephan Berbig	c9e11c3486	cli: add `add-backend` / `remove-backend` / `list-backends` commands Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details So nobody ever has to hand-write the OLLAMA_BACKENDS JSON again. # add a backend, probe it, print the resulting .env line: neuronetz-gateway add-backend embedded http://ollama:11434 neuronetz-gateway add-backend neuro-ollama http://neuro-ollama:11434 --token ABC # update one (e.g. rotate token): neuronetz-gateway add-backend neuro-ollama http://neuro-ollama:11434 --token XYZ --replace # remove: neuronetz-gateway remove-backend neuro-ollama # peek (tokens redacted): neuronetz-gateway list-backends # write directly to a .env file (atomic temp-file + rename): neuronetz-gateway add-backend foo http://foo:11434 --token T --write-env /app/.env # show what would change without doing it: neuronetz-gateway add-backend foo http://foo:11434 --token T --dry-run What each command does: - `add-backend NAME URL` (+ optional --token / --header / --scheme / --replace / --no-validate / --write-env / --dry-run): builds a new backend list (current list parsed from OLLAMA_BACKENDS env, or synthesized from the single-backend fallback if unset), validates the new backend by probing /api/tags with the same headers the gateway will use at runtime (`build_backend_headers`), then prints the resulting OLLAMA_BACKENDS=... line ready to paste — or writes it in place if --write-env is given. Refuses to overwrite an existing name unless --replace is passed. - `remove-backend NAME` (+ --write-env / --dry-run): mirror of add-backend for removal. - `list-backends`: shows the configured backends with tokens redacted to "***" via `redacted_dump`. Useful sanity check after editing .env. All the JSON manipulation is in a new pure-helpers module `cli/backends.py` (parse / serialize / add_or_replace / remove / update_env_file). The Typer commands in `cli/manage.py` are thin shells on top — the logic is unit-tested directly without spinning up Typer or the network. The token is unwrapped from SecretStr exactly once at the serialization boundary (`to_dict`) and never logged. New tests (16): full coverage of the helpers — round-trip serialize/parse, duplicate-name rejection, replace-in-place order preservation, remove on unknown name, redaction, atomic env-file rewrite (insert / replace / idempotent re-apply / create-when-missing). ruff (incl. the per-file ignore add for tests' S105/S106 — placeholder "tok123"-style strings are inputs, not credentials) + mypy --strict (68 source files) clean. pytest: 76 passed + 39 skipped (the 16 new tests + no regressions on the existing 60).	2026-05-27 22:59:53 +02:00
Stephan Berbig	b8a0692aa1	compose: declare ollama_data as external to silence adoption warning Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details Every `docker compose up` was printing: WARN volume "neuro-api_neuro-ollama-data" already exists but was created for project "neuro-api" (expected "neuro-gateway"). Use `external: true` to use an existing volume That's exactly the situation we're in by design: the ollama volume is owned by a NEIGHBORING compose stack (neuro-api or neuro-ollama, depending on the host) and our gateway intentionally adopts it. The warning fires because compose was managing the volume under our project namespace even though the on-disk volume belongs to a different one. Declaring `external: true` on `ollama_data` (and only that volume — `postgres_data` stays compose-managed, since it belongs to this stack) tells compose: "this volume is foreign, just attach it as-is, don't namespace-check it." Warning gone, behavior identical. Trade-off documented in the comment: `external: true` requires the volume to exist before `up`. For fresh deployments where no foreign Ollama volume exists, run `docker volume create <name>` first (or set OLLAMA_DATA_VOLUME to a name you've already created).	2026-05-27 22:34:45 +02:00
Stephan Berbig	653e03bf29	proxy: multi-backend Ollama aggregation with per-model routing + failover Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details The gateway can now aggregate models across SEVERAL Ollama backends and route each request to the correct one. Opt-in via OLLAMA_BACKENDS in .env — single-backend deployments are unaffected (effective_backends() synthesizes a single "default" backend from the legacy OLLAMA_BASE_URL / OLLAMA_AUTH_TOKEN fields when the list is empty). Behavior: - Discovery polls EVERY configured backend in parallel each tick; the cache stores per-backend model lists plus a model → backends priority list (config order = priority order). - /api/tags and /v1/models surface the DEDUPLICATED UNION of all backends' models. - A request's model is looked up in the priority list and proxied to the FIRST backend that hosts it. If that backend errors on the request, the pipeline transparently fails over to the next backend that has the same model (the streaming-failover probes the first chunk before releasing the response, so we never serve partial bytes from a dead backend). - No existence disclosure: a model not hosted by any backend yields the same generic 403 as "model not allowed" (SPEC §13.6 preserved). Components: - config.py: new BackendSpec model + ollama_backends list field + an effective_backends() helper. - proxy/router.py (new): BackendRouter (clients_for_with_failover), build_http_clients() builds one httpx client per backend with its own auth headers, build_backend_headers() exposes the per-backend header composition for the CLI probe. - proxy/discovery.py: DiscoveryCache.set_per_backend() + backends_for(), refresh_all_backends() polls all in parallel, discovery_loop_multi() replaces the single-backend loop in production; the legacy single- backend functions are kept for the dependency-override tests. - proxy/pipeline.py: Pipeline accepts an optional router; the four proxy methods now retry against each candidate backend in priority order on transport error. - lifespan.py: constructs the per-backend client dict, stores the router on app.state, launches discovery_loop_multi. - deps.py: get_backend_router provider + BackendRouterDep type alias; get_pipeline passes the router into Pipeline. - cli/manage.py: probe-ollama iterates every backend and reports per- backend status; list-models groups its output by backend and prints the union count + Redis cache size for sanity. - .env.example + docker-compose.yml: document and pass through OLLAMA_BACKENDS with a real example. Verified: ruff check (clean), mypy --strict src/ + tests/ (clean, 66 source files), pytest (60 passed + 39 skipped — same baseline as before this change; integration tests are Docker-socket-gated).	2026-05-27 22:30:26 +02:00
Stephan Kasdorf	5044a44a17	cleanup, and important settings from the sysetem admin, HELLO Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details	2026-05-27 20:14:09 +02:00
Stephan Berbig	662fbfb442	deploy: upstream Ollama auth token + adoptable data volumes Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details Two production-hardening changes triggered by real issues found on the first prod attempt against neuronetz-ai-01. 1. Upstream auth (the production Ollama is fronted by an auth proxy): - New config: OLLAMA_AUTH_TOKEN (pydantic SecretStr — never appears in repr/logs/errors), plus OLLAMA_AUTH_HEADER (default "Authorization") and OLLAMA_AUTH_SCHEME (default "Bearer") for stacks that expect a non-standard header like X-API-Key. - lifespan._build_upstream_headers() injects the configured header into the single shared httpx client used by both the proxy hot path AND the discovery poller, so /api/tags + /api/chat both authenticate against the upstream automatically. - New CLI: `neuronetz-gateway probe-ollama` — uses the same client config to GET /api/version and /api/tags, reports success/transport- error/HTTP-status, lists the first few discovered models, exits 1 on any failure. The token itself is never printed (only whether one was attached). Lets ops verify upstream reachability before letting real traffic through. - docker-compose.yml passes OLLAMA_AUTH_TOKEN/HEADER/SCHEME through; .env.example documents them with a leave-blank-for-internal-Ollama default. 2. Volume adoption (don't lose existing model data on re-deploy): - docker-compose.yml now pins absolute Docker volume NAMES for both postgres_data and ollama_data, configurable via POSTGRES_DATA_VOLUME and OLLAMA_DATA_VOLUME. Defaults preserve the previous per-project names so existing deployments aren't disturbed. - This addresses the scenario where deploying this compose under a new project directory created fresh, empty volumes alongside an existing `neuro-ollama_ollama-data` volume containing pre-pulled models (incl. deepseek-r1:14b, qwen2.5:14b, gemma3:12b, ...). Setting OLLAMA_DATA_VOLUME=neuro-ollama_ollama-data in .env tells the new stack to mount the existing volume in place — no copy, no downtime. - .env.example documents the override with the exact host's volume name as an example. Both changes are ruff + mypy --strict clean.	2026-05-27 18:59:09 +02:00
Stephan Berbig	b2ec32c852	deploy: target jwilder-proxy production stack Some checks failed CI / ruff (push) Has been cancelled Details CI / mypy --strict (push) Has been cancelled Details CI / pytest (push) Has been cancelled Details CI / bandit (push) Has been cancelled Details CI / pip-audit (push) Has been cancelled Details Production deployment now matches the host setup that already runs neuronetz.ai / neuro-landing: the gateway sits behind the jwilder nginx-proxy + acme-companion already on the host, instead of bundling its own Caddy sidecar. - docker-compose.yml: drop the Caddy service entirely. The gateway joins an external `proxy` Docker network (the same one neuronetz-web / neuronetz-www use) and advertises itself with VIRTUAL_HOST / VIRTUAL_PORT / LETSENCRYPT_HOST / LETSENCRYPT_EMAIL. nginx-proxy routes TLS-terminated traffic to it on the shared network; acme-companion handles Let's Encrypt issuance + renewal for api.neuronetz.ai automatically. NO host ports are published in this compose file anywhere — gateway, postgres, redis, ollama all stay unreachable from the host. Pinned container_names (neuronetz-gateway / -postgres / -redis / -ollama) for stable identification by nginx-proxy and ops scripts. - .env.example: add GATEWAY_VIRTUAL_HOST + LETSENCRYPT_EMAIL; flip the default GATEWAY_TRUSTED_PROXIES to `127.0.0.1,nginx-proxy`. - docs/DEPLOYMENT.md: the canonical path is now jwilder-proxy. Reorganized prerequisites + steps around it; documented adding HSTS and the other security headers via the nginx-proxy custom-config mechanism (/etc/nginx/vhost.d/<host>). The Caddy sidecar lives on as a documented alternative for hosts without jwilder-proxy (ops/caddy/Caddyfile.example is kept). The Ollama-never-exposed non-negotiable is unchanged.	2026-05-26 20:55:20 +02:00
Stephan Berbig	b47a09db91	demo + playground + docs One-command demo so the gateway can be exercised end-to-end without a GPU or a real model download: - demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags, /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count and eval_count on the final frame, /api/embed, /api/show, /api/version). Non-root multi-stage Dockerfile, never published (internal network only). - docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground. Mirrors the prod posture (mock-ollama not exposed). - demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with allow_all_models and a fresh API key via the bootstrap CLI inside the container, then prints the key, the playground URL, and five ready-to-paste curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull). ./demo.sh --down tears everything back down with volumes. - playground/index.html — single-file dark-themed UI served same-origin by the gateway at /playground (CORS-free). Per-endpoint About card with method/ auth/streaming badges, a real description, sample request body, sample response, and a footer note. Live SSE/NDJSON rendering of the response. A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh are visibly gated until an API key is in the field; the Base URL is force-pinned to location.origin three times to defeat browser autofill. - docs/ — API.md (full endpoint reference with curl, streaming formats, error model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule, pointing at a real Ollama backend, env reference), THREAT_MODEL.md (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md. mkdocs.yml (Material theme) wires them together.	2026-05-26 20:52:33 +02:00
Stephan Berbig	844b02aade	tests: unit + integration suite (99 tests; ruff + mypy --strict clean) Real test bodies (not stubs), driven against an in-process httpx.ASGITransport override of the gateway's get_ollama_client dependency pointing at tests/integration/mock_ollama.py. Unit (target 100% on auth/, ratelimit/, budget/): - argon2id roundtrip, wrong-key, garbage encoding, needs_rehash on param change - key format/uniqueness/prefix extraction - token counter (prompt_eval_count + eval_count, embeddings, missing-counts) - translate (OpenAI <-> Ollama for chat/completion/embeddings, streaming chunks, /v1/models list shape) - allowlist (hard-blocks, effective-set semantics across allow_all/inheritance/ empty-discovered) - discovery (parse, cache roundtrip with TTL, fail-closed, tolerates redis=None) - sliding window (allow/block/reset/per-key vs per-tenant/cost-weighted) Integration (testcontainers postgres + redis + in-process mock Ollama): - auth flow (no/malformed/wrong key all return identical sanitized 401) - proxy stream (NDJSON roundtrip, audit row's token counts match, hard-blocked endpoints uniformly 403) - openai_compat (SSE chunks, data: [DONE], non-stream shape, /v1/models) - model_discovery (allow_all sees all, default-deny sees allowed ∩ discovered, /v1/models filtered, unpermitted-but-installed = nonexistent = 403, empty cache denies even allow_all) - rate_limit (429 + Retry-After + headers; Redis down ⇒ 503, never 200) - budget (decrement + headers; pre-burned counter blocks next request) - revocation (INSERT into gateway.revocations → NOTIFY → cache evicted → 401 ≤ 1s) Includes a known-issue xfail flagging a bug in ratelimit/sliding_window.py: the per-hit ZSET member uses id(object()) which returns the same id on consecutive calls, causing same-millisecond hits to overwrite instead of stacking. To be fixed in a follow-up commit.	2026-05-26 20:52:33 +02:00
Stephan Berbig	6a92bc8ce9	proxy: streaming, discovery, OpenAI-compat, rate-limit, budget, audit The hot path. A single Pipeline class owns enforcement so the eight non-negotiables can be reviewed in one place. - Native /api/chat, /api/generate (NDJSON streaming + non-stream), /api/tags, /api/show (system-prompt + template stripped), /api/embed(dings), /api/version (returns gateway version, not Ollama's). Endpoint catch-all returns the same generic 403 for hard-blocked and unknown /api/* paths so attackers cannot enumerate which mutating endpoints exist. - OpenAI-compat /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models with SSE (`data: {...}` + final `data: [DONE]`); preserves streaming end-to-end. - Model discovery (SPEC §4.6): background poller against Ollama /api/tags; Redis + in-process cache (TTL = MODEL_DISCOVERY_CACHE_TTL_S, refresh = MODEL_DISCOVERY_REFRESH_S); fail-closed when the discovered set is empty. - Effective-set resolution in proxy/allowlist.py: allow_all = key.allow_all_models ?? tenant.allow_all_models effective = discovered if allow_all else (key.allowed_models ?? tenant.allowed_models) ∩ discovered A non-effective model returns the same generic 403 whether it's installed- but-unpermitted or doesn't exist at all (no enumeration leak). - Sliding-window rate limit (Redis Lua, single round-trip) for per-key + per-tenant RPM and per-key TPM. Redis-INCR/DECR concurrency semaphore with TTL guard. Token-budget counters per (key, period) with a Postgres ledger for reconciliation across resets. Headers per SPEC §6.5 on every response; 429 carries Retry-After; Redis outage → 503 (fail closed, never 200). - Token counting from the FINAL stream object (NDJSON `done` or the SSE chunk carrying `usage`); the audit row is written AFTER stream close so TTFB is never degraded by bookkeeping. - Audit writer: asyncio.Queue + bounded ring buffer; deny-mode flip on overflow. Optional prompt log per key (TTL'd). - Revocation listener: asyncpg LISTEN on key_revoked → evict the Redis cache entry within ~1s of the console writing to gateway.revocations. - Prometheus counters/histograms labeled by tenant only (per SPEC §13.3).	2026-05-26 20:52:33 +02:00
Stephan Berbig	6431b2f72c	auth + cli: argon2id keys, bearer middleware, bootstrap commands - argon2id hash/verify/needs_rehash; constant-time path; parameters from config. - Key format nz_<prefix><secret> (12-char stored prefix incl. nz_, 32-char random secret); the full key is generated with secrets, hashed argon2id, and printed exactly once at creation — never persisted, never logged. - Bearer auth middleware: extract → resolve prefix → Redis cache (TTL from REDIS_KEY_CACHE_TTL_S) → DB → argon2 verify → cache the resolved Principal. Fail-closed; uniform sanitized 401 with X-Request-ID; per-IP auth-failure counter to slow brute force. Exempt paths: /healthz /readyz /metrics /, and /playground when enabled. - Bootstrap CLI (Typer) per SPEC §11: create-tenant (with --allow-all-models), create-key, list-keys, revoke-key, set-budget, set-models (--models or --allow-all / --no-allow-all), show-usage, list-models. - Async repositories for tenants, api_keys, key_limits, budget_usage, revocations, audit_log — including the join+inheritance flatten that produces a Principal with effective rpm/tpm/concurrent/allowed_models/ allow_all_models for the auth cache.	2026-05-26 20:52:33 +02:00
Stephan Berbig	d79f17b3bb	scaffold: project skeleton, schema, healthz/readyz, CI Initial project structure for neuronetz-gateway per scope-docs/SPEC.md: - Python 3.12 / FastAPI / SQLAlchemy 2.0 (async) / Redis / Postgres stack managed by uv. Multi-stage non-root Dockerfile, prod + dev compose files (ollama service is NEVER published in either), Caddyfile + systemd unit, justfile, GitHub Actions CI (ruff, mypy --strict, pytest, bandit, pip-audit). - Pydantic-Settings config covering every env var from SPEC §7, including the MODEL_DISCOVERY_* keys for the dynamic-discovery feature (§4.6). - Alembic 0001_initial creates the full gateway schema (8 tables, 3 enums, notify_key_revoked() trigger), incl. allow_all_models on tenant_limits and key_limits for the per-tenant auto-grant toggle. - Working /healthz, /readyz (fail-closed when deps unreachable), and a Prometheus /metrics stub. Sanitizing error handlers that attach X-Request-ID to every response and never leak upstream internals. - SPEC + AGENT_PROMPT included under scope-docs/ (source of truth).	2026-05-26 20:50:35 +02:00

13 Commits