One-command demo so the gateway can be exercised end-to-end without a GPU or a real model download: - demo/mock-ollama/ — tiny FastAPI service emulating Ollama (/api/tags, /api/chat + /api/generate NDJSON streaming with realistic prompt_eval_count and eval_count on the final frame, /api/embed, /api/show, /api/version). Non-root multi-stage Dockerfile, never published (internal network only). - docker-compose.demo.yml — postgres + redis + mock-ollama + gateway, with PLAYGROUND_ENABLED=true and ./playground mounted read-only at /app/playground. Mirrors the prod posture (mock-ollama not exposed). - demo.sh — brings the stack up, waits on /healthz, creates a demo tenant with allow_all_models and a fresh API key via the bootstrap CLI inside the container, then prints the key, the playground URL, and five ready-to-paste curl commands (SSE chat, NDJSON chat, /v1/models, a 401, a 403 /api/pull). ./demo.sh --down tears everything back down with volumes. - playground/index.html — single-file dark-themed UI served same-origin by the gateway at /playground (CORS-free). Per-endpoint About card with method/ auth/streaming badges, a real description, sample request body, sample response, and a footer note. Live SSE/NDJSON rendering of the response. A live, copyable curl box that mirrors exactly what Run sends. Run + Refresh are visibly gated until an API key is in the field; the Base URL is force-pinned to location.origin three times to defeat browser autofill. - docs/ — API.md (full endpoint reference with curl, streaming formats, error model, SPEC §6.5 response headers), ARCHITECTURE.md (incl. §4.6 discovery + the request lifecycle), DEPLOYMENT.md (Ollama-never-exposed rule, pointing at a real Ollama backend, env reference), THREAT_MODEL.md (SPEC §3 table + the allow_all_models opt-in notes), OPERATIONS.md (key/budget/model/usage runbook + fail-closed table), PLAYGROUND.md. mkdocs.yml (Material theme) wires them together.
62 lines
1.9 KiB
Docker
62 lines
1.9 KiB
Docker
# syntax=docker/dockerfile:1.7
|
|
#
|
|
# mock-ollama — a tiny FastAPI app emulating the Ollama HTTP API for the demo.
|
|
#
|
|
# builder stage : installs deps into a self-contained virtualenv.
|
|
# runtime stage : copies the venv + app, drops to a NON-ROOT user, no build
|
|
# tools, runs uvicorn on :11434.
|
|
#
|
|
# This image exists ONLY for the demo stack (docker-compose.demo.yml). It lets
|
|
# the demo run with no GPU and no model downloads. It is never published to the
|
|
# host — like real Ollama, it is reachable only on the internal Docker network.
|
|
|
|
# ----------------------------------------------------------------------------
|
|
# Stage 1 — builder
|
|
# ----------------------------------------------------------------------------
|
|
FROM python:3.12-slim AS builder
|
|
|
|
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
|
|
PIP_NO_CACHE_DIR=1 \
|
|
VIRTUAL_ENV=/opt/venv \
|
|
PATH=/opt/venv/bin:$PATH
|
|
|
|
RUN python -m venv /opt/venv
|
|
|
|
WORKDIR /app
|
|
COPY requirements.txt ./
|
|
RUN pip install -r requirements.txt
|
|
|
|
# ----------------------------------------------------------------------------
|
|
# Stage 2 — runtime
|
|
# ----------------------------------------------------------------------------
|
|
FROM python:3.12-slim AS runtime
|
|
|
|
# curl is used by the compose healthcheck.
|
|
RUN apt-get update \
|
|
&& apt-get install -y --no-install-recommends curl \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Non-root user.
|
|
RUN groupadd --system --gid 10001 mock \
|
|
&& useradd --system --uid 10001 --gid mock --home-dir /app --shell /usr/sbin/nologin mock
|
|
|
|
ENV VIRTUAL_ENV=/opt/venv \
|
|
PATH=/opt/venv/bin:$PATH \
|
|
PYTHONUNBUFFERED=1 \
|
|
PYTHONDONTWRITEBYTECODE=1 \
|
|
MOCK_OLLAMA_PORT=11434
|
|
|
|
WORKDIR /app
|
|
|
|
COPY --from=builder /opt/venv /opt/venv
|
|
COPY app.py ./
|
|
|
|
USER mock
|
|
|
|
EXPOSE 11434
|
|
|
|
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=5 \
|
|
CMD curl -fsS "http://127.0.0.1:${MOCK_OLLAMA_PORT}/api/version" || exit 1
|
|
|
|
CMD ["python", "-m", "app"]
|