Commit Graph

4 Commits

Author SHA1 Message Date
Stephan Berbig
844b02aade tests: unit + integration suite (99 tests; ruff + mypy --strict clean)
Real test bodies (not stubs), driven against an in-process httpx.ASGITransport
override of the gateway's get_ollama_client dependency pointing at
tests/integration/mock_ollama.py.

Unit (target 100% on auth/, ratelimit/, budget/):
- argon2id roundtrip, wrong-key, garbage encoding, needs_rehash on param change
- key format/uniqueness/prefix extraction
- token counter (prompt_eval_count + eval_count, embeddings, missing-counts)
- translate (OpenAI <-> Ollama for chat/completion/embeddings, streaming chunks,
  /v1/models list shape)
- allowlist (hard-blocks, effective-set semantics across allow_all/inheritance/
  empty-discovered)
- discovery (parse, cache roundtrip with TTL, fail-closed, tolerates redis=None)
- sliding window (allow/block/reset/per-key vs per-tenant/cost-weighted)

Integration (testcontainers postgres + redis + in-process mock Ollama):
- auth flow (no/malformed/wrong key all return identical sanitized 401)
- proxy stream (NDJSON roundtrip, audit row's token counts match, hard-blocked
  endpoints uniformly 403)
- openai_compat (SSE chunks, data: [DONE], non-stream shape, /v1/models)
- model_discovery (allow_all sees all, default-deny sees allowed ∩ discovered,
  /v1/models filtered, unpermitted-but-installed = nonexistent = 403,
  empty cache denies even allow_all)
- rate_limit (429 + Retry-After + headers; Redis down ⇒ 503, never 200)
- budget (decrement + headers; pre-burned counter blocks next request)
- revocation (INSERT into gateway.revocations → NOTIFY → cache evicted → 401 ≤ 1s)

Includes a known-issue xfail flagging a bug in ratelimit/sliding_window.py:
the per-hit ZSET member uses id(object()) which returns the same id on
consecutive calls, causing same-millisecond hits to overwrite instead of
stacking. To be fixed in a follow-up commit.
2026-05-26 20:52:33 +02:00
Stephan Berbig
6a92bc8ce9 proxy: streaming, discovery, OpenAI-compat, rate-limit, budget, audit
The hot path. A single Pipeline class owns enforcement so the eight
non-negotiables can be reviewed in one place.

- Native /api/chat, /api/generate (NDJSON streaming + non-stream), /api/tags,
  /api/show (system-prompt + template stripped), /api/embed(dings), /api/version
  (returns gateway version, not Ollama's). Endpoint catch-all returns the same
  generic 403 for hard-blocked and unknown /api/* paths so attackers cannot
  enumerate which mutating endpoints exist.
- OpenAI-compat /v1/chat/completions, /v1/completions, /v1/embeddings,
  /v1/models with SSE (`data: {...}` + final `data: [DONE]`); preserves
  streaming end-to-end.
- Model discovery (SPEC §4.6): background poller against Ollama /api/tags;
  Redis + in-process cache (TTL = MODEL_DISCOVERY_CACHE_TTL_S, refresh =
  MODEL_DISCOVERY_REFRESH_S); fail-closed when the discovered set is empty.
- Effective-set resolution in proxy/allowlist.py:
    allow_all = key.allow_all_models ?? tenant.allow_all_models
    effective = discovered if allow_all
                else (key.allowed_models ?? tenant.allowed_models) ∩ discovered
  A non-effective model returns the same generic 403 whether it's installed-
  but-unpermitted or doesn't exist at all (no enumeration leak).
- Sliding-window rate limit (Redis Lua, single round-trip) for per-key +
  per-tenant RPM and per-key TPM. Redis-INCR/DECR concurrency semaphore with
  TTL guard. Token-budget counters per (key, period) with a Postgres ledger
  for reconciliation across resets. Headers per SPEC §6.5 on every response;
  429 carries Retry-After; Redis outage → 503 (fail closed, never 200).
- Token counting from the FINAL stream object (NDJSON `done` or the SSE chunk
  carrying `usage`); the audit row is written AFTER stream close so TTFB is
  never degraded by bookkeeping.
- Audit writer: asyncio.Queue + bounded ring buffer; deny-mode flip on overflow.
  Optional prompt log per key (TTL'd).
- Revocation listener: asyncpg LISTEN on key_revoked → evict the Redis cache
  entry within ~1s of the console writing to gateway.revocations.
- Prometheus counters/histograms labeled by tenant only (per SPEC §13.3).
2026-05-26 20:52:33 +02:00
Stephan Berbig
6431b2f72c auth + cli: argon2id keys, bearer middleware, bootstrap commands
- argon2id hash/verify/needs_rehash; constant-time path; parameters from config.
- Key format nz_<prefix><secret> (12-char stored prefix incl. nz_, 32-char
  random secret); the full key is generated with secrets, hashed argon2id, and
  printed exactly once at creation — never persisted, never logged.
- Bearer auth middleware: extract → resolve prefix → Redis cache (TTL from
  REDIS_KEY_CACHE_TTL_S) → DB → argon2 verify → cache the resolved Principal.
  Fail-closed; uniform sanitized 401 with X-Request-ID; per-IP auth-failure
  counter to slow brute force. Exempt paths: /healthz /readyz /metrics /, and
  /playground when enabled.
- Bootstrap CLI (Typer) per SPEC §11: create-tenant (with --allow-all-models),
  create-key, list-keys, revoke-key, set-budget, set-models (--models or
  --allow-all / --no-allow-all), show-usage, list-models.
- Async repositories for tenants, api_keys, key_limits, budget_usage,
  revocations, audit_log — including the join+inheritance flatten that
  produces a Principal with effective rpm/tpm/concurrent/allowed_models/
  allow_all_models for the auth cache.
2026-05-26 20:52:33 +02:00
Stephan Berbig
d79f17b3bb scaffold: project skeleton, schema, healthz/readyz, CI
Initial project structure for neuronetz-gateway per scope-docs/SPEC.md:

- Python 3.12 / FastAPI / SQLAlchemy 2.0 (async) / Redis / Postgres stack
  managed by uv. Multi-stage non-root Dockerfile, prod + dev compose files
  (ollama service is NEVER published in either), Caddyfile + systemd unit,
  justfile, GitHub Actions CI (ruff, mypy --strict, pytest, bandit, pip-audit).
- Pydantic-Settings config covering every env var from SPEC §7, including the
  MODEL_DISCOVERY_* keys for the dynamic-discovery feature (§4.6).
- Alembic 0001_initial creates the full gateway schema (8 tables, 3 enums,
  notify_key_revoked() trigger), incl. allow_all_models on tenant_limits and
  key_limits for the per-tenant auto-grant toggle.
- Working /healthz, /readyz (fail-closed when deps unreachable), and a
  Prometheus /metrics stub. Sanitizing error handlers that attach X-Request-ID
  to every response and never leak upstream internals.
- SPEC + AGENT_PROMPT included under scope-docs/ (source of truth).
2026-05-26 20:50:35 +02:00