stage-6: model inference server

scripts/serve_model.py — FastAPI in the CUDA container, loads base Qwen3.5-4B + a psyc adapter once and serves POST /infer. Lets the cockpit (no torch in its venv) put a real fine-tuned model behind a Worker Mesh bot over HTTP. Dockerfile.train gains a fastapi + uvicorn layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:05:16 +02:00
parent 72d80dfd60
commit 2a9c0bf34a
2 changed files with 91 additions and 0 deletions
--- a/Dockerfile.train
+++ b/Dockerfile.train
@@ -27,6 +27,9 @@ ENV PYTHONUNBUFFERED=1 \
 RUN pip install --upgrade pip && \
    pip install unsloth unsloth_zoo trl datasets

+# fastapi + uvicorn power scripts/serve_model.py (the inference server).
+RUN pip install fastapi uvicorn
+
 WORKDIR /workspace

 # Scripts are mounted at run time (-v $(pwd)/scripts:/scripts), never baked in.