Files
neuronetz-gateway/src/neuronetz_gateway/budget/ledger.py
Stephan Berbig 6a92bc8ce9 proxy: streaming, discovery, OpenAI-compat, rate-limit, budget, audit
The hot path. A single Pipeline class owns enforcement so the eight
non-negotiables can be reviewed in one place.

- Native /api/chat, /api/generate (NDJSON streaming + non-stream), /api/tags,
  /api/show (system-prompt + template stripped), /api/embed(dings), /api/version
  (returns gateway version, not Ollama's). Endpoint catch-all returns the same
  generic 403 for hard-blocked and unknown /api/* paths so attackers cannot
  enumerate which mutating endpoints exist.
- OpenAI-compat /v1/chat/completions, /v1/completions, /v1/embeddings,
  /v1/models with SSE (`data: {...}` + final `data: [DONE]`); preserves
  streaming end-to-end.
- Model discovery (SPEC §4.6): background poller against Ollama /api/tags;
  Redis + in-process cache (TTL = MODEL_DISCOVERY_CACHE_TTL_S, refresh =
  MODEL_DISCOVERY_REFRESH_S); fail-closed when the discovered set is empty.
- Effective-set resolution in proxy/allowlist.py:
    allow_all = key.allow_all_models ?? tenant.allow_all_models
    effective = discovered if allow_all
                else (key.allowed_models ?? tenant.allowed_models) ∩ discovered
  A non-effective model returns the same generic 403 whether it's installed-
  but-unpermitted or doesn't exist at all (no enumeration leak).
- Sliding-window rate limit (Redis Lua, single round-trip) for per-key +
  per-tenant RPM and per-key TPM. Redis-INCR/DECR concurrency semaphore with
  TTL guard. Token-budget counters per (key, period) with a Postgres ledger
  for reconciliation across resets. Headers per SPEC §6.5 on every response;
  429 carries Retry-After; Redis outage → 503 (fail closed, never 200).
- Token counting from the FINAL stream object (NDJSON `done` or the SSE chunk
  carrying `usage`); the audit row is written AFTER stream close so TTFB is
  never degraded by bookkeeping.
- Audit writer: asyncio.Queue + bounded ring buffer; deny-mode flip on overflow.
  Optional prompt log per key (TTL'd).
- Revocation listener: asyncpg LISTEN on key_revoked → evict the Redis cache
  entry within ~1s of the console writing to gateway.revocations.
- Prometheus counters/histograms labeled by tenant only (per SPEC §13.3).
2026-05-26 20:52:33 +02:00

59 lines
2.0 KiB
Python

"""Postgres budget ledger reconciliation.
Persists token usage to ``gateway.budget_usage`` (the durable source of truth)
via an idempotent upsert keyed by (key_id, period, period_start). The Redis
counter (``counter.py``) is the fast path; this ledger is what survives a Redis
flush and what ``show-usage`` reports against.
"""
from __future__ import annotations
import uuid
from sqlalchemy.dialects.postgresql import insert as pg_insert
from sqlalchemy.ext.asyncio import AsyncSession
from neuronetz_gateway.budget.counter import period_start
from neuronetz_gateway.db.models import BudgetPeriod, BudgetUsage
class BudgetLedger:
"""Source-of-truth budget accounting in Postgres."""
def __init__(self, session: AsyncSession) -> None:
self._session = session
async def record_usage(
self, key_id: str, period: BudgetPeriod, tokens_in: int, tokens_out: int
) -> None:
"""Upsert usage into ``gateway.budget_usage`` for the active period.
Uses an ``ON CONFLICT`` upsert so concurrent writers accumulate rather
than clobber. ``requests`` increments by one per recorded request.
"""
start = period_start(period)
stmt = pg_insert(BudgetUsage).values(
key_id=uuid.UUID(key_id) if isinstance(key_id, str) else key_id,
period=period,
period_start=start,
tokens_in=tokens_in,
tokens_out=tokens_out,
requests=1,
)
stmt = stmt.on_conflict_do_update(
index_elements=[
BudgetUsage.key_id,
BudgetUsage.period,
BudgetUsage.period_start,
],
set_={
"tokens_in": BudgetUsage.tokens_in + stmt.excluded.tokens_in,
"tokens_out": BudgetUsage.tokens_out + stmt.excluded.tokens_out,
"requests": BudgetUsage.requests + stmt.excluded.requests,
},
)
await self._session.execute(stmt)
__all__ = ["BudgetLedger"]