Files
Stephan Berbig b47a09db91 demo + playground + docs
One-command demo so the gateway can be exercised end-to-end without a GPU or a
real model download:

- demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags,
  /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count
  and eval_count on the final frame, /api/embed, /api/show, /api/version).
  Non-root multi-stage Dockerfile, never published (internal network only).
- docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with
  PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground.
  Mirrors the prod posture (mock-ollama not exposed).
- demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with
  allow_all_models and a fresh API key via the bootstrap CLI inside the
  container, then prints the key, the playground URL, and five ready-to-paste
  curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull).
  ./demo.sh --down tears everything back down with volumes.
- playground/index.html — single-file dark-themed UI served same-origin by
  the gateway at /playground (CORS-free). Per-endpoint About card with method/
  auth/streaming badges, a real description, sample request body, sample
  response, and a footer note. Live SSE/NDJSON rendering of the response.
  A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh
  are visibly gated until an API key is in the field; the Base URL is
  force-pinned to location.origin three times to defeat browser autofill.
- docs/ — API.md (full endpoint reference with curl, streaming formats, error
  model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery
  + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule,
  pointing at a real Ollama backend, env reference), THREAT_MODEL.md
  (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md
  (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md.
  mkdocs.yml (Material theme) wires them together.
2026-05-26 20:52:33 +02:00

62 lines
1.9 KiB
Docker

# syntax=docker/dockerfile:1.7
#
# mock-ollama — a tiny FastAPI app emulating the Ollama HTTP API for the demo.
#
# builder stage : installs deps into a self-contained virtualenv.
# runtime stage : copies the venv + app, drops to a NON-ROOT user, no build
# tools, runs uvicorn on :11434.
#
# This image exists ONLY for the demo stack (docker-compose.demo.yml). It lets
# the demo run with no GPU and no model downloads. It is never published to the
# host — like real Ollama, it is reachable only on the internal Docker network.
# ----------------------------------------------------------------------------
# Stage 1 — builder
# ----------------------------------------------------------------------------
FROM python:3.12-slim AS builder
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
VIRTUAL_ENV=/opt/venv \
PATH=/opt/venv/bin:$PATH
RUN python -m venv /opt/venv
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
# ----------------------------------------------------------------------------
# Stage 2 — runtime
# ----------------------------------------------------------------------------
FROM python:3.12-slim AS runtime
# curl is used by the compose healthcheck.
RUN apt-get update \
&& apt-get install -y --no-install-recommends curl \
&& rm -rf /var/lib/apt/lists/*
# Non-root user.
RUN groupadd --system --gid 10001 mock \
&& useradd --system --uid 10001 --gid mock --home-dir /app --shell /usr/sbin/nologin mock
ENV VIRTUAL_ENV=/opt/venv \
PATH=/opt/venv/bin:$PATH \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
MOCK_OLLAMA_PORT=11434
WORKDIR /app
COPY --from=builder /opt/venv /opt/venv
COPY app.py ./
USER mock
EXPOSE 11434
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=5 \
CMD curl -fsS "http://127.0.0.1:${MOCK_OLLAMA_PORT}/api/version" || exit 1
CMD ["python", "-m", "app"]