scaffold: project skeleton, schema, healthz/readyz, CI
Initial project structure for neuronetz-gateway per scope-docs/SPEC.md: - Python 3.12 / FastAPI / SQLAlchemy 2.0 (async) / Redis / Postgres stack managed by uv. Multi-stage non-root Dockerfile, prod + dev compose files (ollama service is NEVER published in either), Caddyfile + systemd unit, justfile, GitHub Actions CI (ruff, mypy --strict, pytest, bandit, pip-audit). - Pydantic-Settings config covering every env var from SPEC §7, including the MODEL_DISCOVERY_* keys for the dynamic-discovery feature (§4.6). - Alembic 0001_initial creates the full gateway schema (8 tables, 3 enums, notify_key_revoked() trigger), incl. allow_all_models on tenant_limits and key_limits for the per-tenant auto-grant toggle. - Working /healthz, /readyz (fail-closed when deps unreachable), and a Prometheus /metrics stub. Sanitizing error handlers that attach X-Request-ID to every response and never leak upstream internals. - SPEC + AGENT_PROMPT included under scope-docs/ (source of truth).
This commit is contained in:
121
scope-docs/AGENT_PROMPT.md
Normal file
121
scope-docs/AGENT_PROMPT.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Build Order: neuronetz-gateway v0.1.0
|
||||
|
||||
## Context
|
||||
|
||||
The Ollama instance at `https://api.neuronetz.ai` is currently exposed without authentication. This is a security incident in waiting. Your job is to build the gateway that closes that gap and forms the commercial API surface of the Neuronetz AI platform.
|
||||
|
||||
The full specification is in **`SPEC.md`** in this repository. Read it before writing any code. It is the source of truth; if anything below conflicts with it, SPEC.md wins.
|
||||
|
||||
## Mission
|
||||
|
||||
Implement `neuronetz-gateway` per SPEC.md to a state that satisfies **§12 Acceptance Criteria**. Nothing less ships.
|
||||
|
||||
## Non-Negotiables
|
||||
|
||||
These are hard constraints. Violating any of them is a build failure regardless of feature completeness.
|
||||
|
||||
1. **Fail closed, always.** If a security or budgeting check cannot be performed (Redis down, DB unreachable, ambiguous state), deny the request. Never default to allow.
|
||||
2. **Ollama never reachable from outside the Docker internal network.** No `ports:` mapping for the ollama service in any compose file shipped with the project. Document this prominently.
|
||||
3. **No secrets in code, no secrets in logs, no secrets in errors.** Argon2id for key storage. Constant-time comparison only. Keys printed exactly once at creation.
|
||||
4. **No reflected upstream errors.** Ollama errors are sanitized at the gateway boundary. Map to generic 4xx/5xx with a request ID.
|
||||
5. **Mutating Ollama endpoints (`/api/pull`, `/api/push`, `/api/create`, `/api/copy`, `/api/delete`, `/api/blobs/*`) are hard-blocked.** Not configurable. Not behind a feature flag. Blocked.
|
||||
6. **Streaming integrity.** Token counting and audit writes happen **after** stream close, never on the hot path. Time-to-first-byte must not be degraded by gateway bookkeeping.
|
||||
7. **`mypy --strict` and `ruff check` clean before any PR is opened.** No `# type: ignore` without an inline justification comment.
|
||||
8. **Test coverage targets (§9) are a gate, not a goal.** 100% on `auth/`, `ratelimit/`, `budget/`. CI fails below threshold.
|
||||
9. **Apache 2.0 license file present from commit one.** No GPL dependencies.
|
||||
10. **The bootstrap CLI must work before the first manual `curl`.** No "I'll create a key by hand in the DB just to test it" — if the CLI can't create a key, fix the CLI first.
|
||||
|
||||
## Phasing
|
||||
|
||||
Five phases. Each phase has an explicit exit criterion. **Do not start phase N+1 until phase N's exit criterion is verifiably met.** PM/Control: enforce this.
|
||||
|
||||
### Phase 1 — Scaffold
|
||||
|
||||
- Repo layout per SPEC §8
|
||||
- `pyproject.toml`, `uv.lock`, Dockerfile, docker-compose.yml, docker-compose.dev.yml, .env.example, README, LICENSE
|
||||
- Alembic configured; migration `0001_initial.py` creates schema `gateway` and all tables per SPEC §5
|
||||
- `make` or `just` targets: `dev`, `test`, `lint`, `typecheck`, `migrate`, `compose-up`, `compose-down`
|
||||
- CI workflow runs: ruff, mypy, pytest, bandit, pip-audit
|
||||
- **Exit criterion:** `docker compose -f docker-compose.dev.yml up` brings up postgres + redis + a stub gateway that responds 200 on `/healthz` and 503 on `/readyz` (because no Ollama yet). Migrations apply cleanly. CI is green on an empty test suite.
|
||||
|
||||
### Phase 2 — Core proxy + auth
|
||||
|
||||
- Bootstrap CLI (`create-tenant`, `create-key`, `list-keys`, `revoke-key`) working end-to-end
|
||||
- Argon2id hashing module with unit tests covering: hash, verify, constant-time behavior, rehash-on-parameter-change
|
||||
- Auth middleware: Bearer extraction, prefix lookup, hash verify, Redis cache with TTL
|
||||
- Ollama proxy for `/api/chat` and `/api/generate` — both streamed (NDJSON) and non-streamed
|
||||
- Endpoint allowlist enforced
|
||||
- **Model discovery (SPEC §4.6):** background poll of Ollama `/api/tags`, cached in Redis + in-process, fail-closed when unavailable
|
||||
- Model allowlist enforced per-tenant via the **effective set** (allow_all → all discovered; else `allowed_models ∩ discovered`); key-level `allow_all_models` overrides tenant
|
||||
- Error handler: sanitized responses, request ID in every error
|
||||
- Audit log writer (buffered, async)
|
||||
- Mock Ollama in `tests/integration/mock_ollama.py` (no real model required for CI)
|
||||
- **Exit criterion:** A key created via CLI can call `/api/chat` and `/api/generate` through Caddy → gateway → mock Ollama, streaming works, audit rows land in Postgres with correct token counts, `/api/pull` returns 403, no-auth returns 401, wrong-key returns 401. Model discovery populates from the (mock) Ollama `/api/tags`; `/api/tags` returns the tenant's effective set; an `allow_all_models` tenant sees all discovered models, a default-deny tenant sees only `allowed ∩ discovered`, and a non-effective model returns 403; discovery-unavailable fails closed. Integration tests cover all of the above.
|
||||
|
||||
### Phase 3 — Rate limit + budget + OpenAI-compat
|
||||
|
||||
- Sliding window rate limit (Redis Lua script) — per-key RPM, per-tenant RPM, per-key TPM
|
||||
- Concurrency semaphore (Redis-backed) with TTL guard
|
||||
- Token budget counters in Redis with Postgres ledger reconciliation on period rollover
|
||||
- OpenAI-compatibility layer: `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, `/v1/models` with full SSE streaming and `data: [DONE]` terminator
|
||||
- Schema translation tests with golden fixtures (request in OpenAI → expected Ollama request; response from Ollama → expected OpenAI response)
|
||||
- Rate-limit and budget response headers per SPEC §6.5
|
||||
- **Exit criterion:** Locust test (100 concurrent users, 5 min) shows correct 429 behavior at the limit, correct token accounting, p99 gateway overhead < 25 ms. OpenAI Python SDK pointed at `/v1` successfully completes streaming chat. Killing Redis mid-test produces 503 (fail closed), not 200.
|
||||
|
||||
### Phase 4 — Audit, prompt log, revocation
|
||||
|
||||
- Prompt log (opt-in per key, TTL) with daily sweeper task
|
||||
- Audit log retention sweeper (TTL per tenant config)
|
||||
- Buffered audit writer with ring-buffer overflow → deny-mode behavior
|
||||
- Revocation flow: console (simulated via direct INSERT in tests) writes `gateway.revocations` → NOTIFY → gateway evicts Redis cache → next request with revoked key returns 401 within 1 second
|
||||
- Prometheus `/metrics` (loopback only) with: `gateway_requests_total{tenant,model,status}`, `gateway_tokens_total{tenant,model,direction}`, `gateway_request_duration_seconds{tenant,model}` (histogram)
|
||||
- `/readyz` checks DB + Redis + Ollama all reachable
|
||||
- Circuit breaker on Ollama failures
|
||||
- **Exit criterion:** Revocation E2E test green. Prompt log retention TTL works (use freeze-time to simulate). Metrics scrape returns valid Prometheus exposition. `/readyz` flips to 503 when any dependency is down.
|
||||
|
||||
### Phase 5 — Harden, document, release
|
||||
|
||||
- `docs/ARCHITECTURE.md`, `docs/DEPLOYMENT.md`, `docs/API.md`, `docs/THREAT_MODEL.md`, `docs/OPERATIONS.md` complete
|
||||
- Caddyfile example with Let's Encrypt for `api.neuronetz.ai` and security headers (HSTS, X-Content-Type-Options, no Server header, no X-Powered-By)
|
||||
- Systemd unit file for non-Compose deployments
|
||||
- Multi-stage Dockerfile with non-root user, distroless or `python:3.12-slim` final stage, no build tools in final image
|
||||
- `pip-audit` and `bandit` clean in CI
|
||||
- Image scan (Trivy or Grype) clean of HIGH/CRITICAL
|
||||
- Tag `v0.1.0`, build and push image, GitHub release with changelog
|
||||
- **Exit criterion:** Every box in SPEC §12 checked, signed off by Control. Image runnable from a fresh host with only docker + a `.env`. README quickstart works for someone who has never seen the repo.
|
||||
|
||||
## Agent Role Assignments
|
||||
|
||||
For the multi-agent orchestrator (Fritz/UI-UX/DevOps/QA/Control/Timo/PM):
|
||||
|
||||
| Agent | Owns |
|
||||
|---|---|
|
||||
| **Backend / Fritz** | All Python code under `src/neuronetz_gateway/`, Alembic migrations, CLI. Primary author. |
|
||||
| **DevOps** | Dockerfile, docker-compose.yml(s), Caddyfile, systemd unit, CI workflows, image scanning, release tagging. |
|
||||
| **QA** | All tests under `tests/`. Owns coverage gate. Writes the locust scenarios. Verifies acceptance criteria at each phase exit. |
|
||||
| **UI-UX** | Not active this project (no UI surface here). Console project will pick this up. |
|
||||
| **Control / Timo** | Enforces phase gates. Refuses to advance a phase whose exit criterion isn't met. Runs the acceptance checklist at end of Phase 5. |
|
||||
| **PM** | Tracks the phase progression, opens YouTrack tickets per phase, runs daily standups against this prompt, surfaces blockers. |
|
||||
|
||||
## Working Agreements
|
||||
|
||||
- **Branch per phase.** `phase-1-scaffold`, `phase-2-proxy-auth`, etc. Merge to `main` only after phase exit criterion is verified.
|
||||
- **PRs are reviewed against SPEC.md.** "Does this match the spec? If not, is SPEC.md wrong or is the PR wrong?" — that's the review question.
|
||||
- **SPEC changes are explicit.** If a phase reveals a spec mistake, amend SPEC.md in a separate PR before changing the implementation. Never drift silently.
|
||||
- **Commit messages reference the section.** e.g. `auth: implement argon2id verify per SPEC §5, §9`.
|
||||
- **No TODOs in main.** If something is deferred, it becomes a tracked issue, not a code comment.
|
||||
- **Open questions (SPEC §13) are resolved in writing.** Decision goes in SPEC.md, not in a Slack message that gets lost.
|
||||
|
||||
## What "Done" Looks Like
|
||||
|
||||
A fresh clone, a fresh host, a domain pointing at it, and a `.env` file. `docker compose up`. Five minutes later, `curl -H "Authorization: Bearer nz_..." https://api.neuronetz.ai/v1/chat/completions -d '...'` streams a response. The Ollama port is not open. The audit log has a row. The budget counter decremented. The metrics endpoint shows the request. The locust suite passes. The threat model document explains every defense.
|
||||
|
||||
When all of that is true and SPEC §12 is fully ticked, ship v0.1.0.
|
||||
|
||||
## When You Get Stuck
|
||||
|
||||
- **Ambiguity in the spec → ask, don't guess.** Open a question in the PM channel; if resolved, amend SPEC.md.
|
||||
- **Conflict between speed and correctness → correctness wins.** This is security infrastructure. We do not ship "good enough."
|
||||
- **Conflict between scope creep and v0.1.0 → defer.** New ideas go in a follow-up issue. v0.1.0 ships per spec.
|
||||
|
||||
Start with Phase 1. Read SPEC.md first.
|
||||
Reference in New Issue
Block a user