scripts/serve_model.py — FastAPI in the CUDA container, loads base Qwen3.5-4B + a psyc adapter once and serves POST /infer. Lets the cockpit (no torch in its venv) put a real fine-tuned model behind a Worker Mesh bot over HTTP. Dockerfile.train gains a fastapi + uvicorn layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
37 lines
1.4 KiB
Docker
37 lines
1.4 KiB
Docker
# psyc training container — unsloth + Qwen3.5 QLoRA fine-tuning.
|
|
#
|
|
# Build:
|
|
# docker build -t psyc-trainer -f Dockerfile.train .
|
|
#
|
|
# Run (24 GB GPU; mounts host data/ + scripts/ so script edits need no rebuild):
|
|
# docker run --gpus all --rm --entrypoint python \
|
|
# -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
|
|
# psyc-trainer /scripts/train_qlora.py \
|
|
# --dataset /data/datasets/ioc_extraction-v2.jsonl \
|
|
# --dataset /data/datasets/severity_classification-v2.jsonl \
|
|
# --dataset /data/datasets/routing_decision-v2.jsonl \
|
|
# --dataset /data/datasets/tlp_assignment-v2.jsonl \
|
|
# --output /data/adapters/psyc-v2
|
|
#
|
|
# Base image already ships Python 3.11 + torch 2.6 + CUDA 12.4 + cuDNN9, so
|
|
# there is no apt step and no torch download. Qwen3.5 needs transformers v5 —
|
|
# unsloth pulls it automatically. The training/eval scripts are MOUNTED at run
|
|
# time (not baked in) so editing scripts/*.py never needs an image rebuild.
|
|
|
|
FROM pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel
|
|
|
|
ENV PYTHONUNBUFFERED=1 \
|
|
PIP_NO_CACHE_DIR=1 \
|
|
HF_HOME=/data/.hf-cache
|
|
|
|
RUN pip install --upgrade pip && \
|
|
pip install unsloth unsloth_zoo trl datasets
|
|
|
|
# fastapi + uvicorn power scripts/serve_model.py (the inference server).
|
|
RUN pip install fastapi uvicorn
|
|
|
|
WORKDIR /workspace
|
|
|
|
# Scripts are mounted at run time (-v $(pwd)/scripts:/scripts), never baked in.
|
|
ENTRYPOINT ["python"]
|