Commit Graph

5 Commits

Author SHA1 Message Date
m17hr1l
61b7b8ef20 stage-28: deploy.sh — idempotent remote deploy + health probe
scripts/deploy.sh pushes the current branch to origin, ssh's into the
prod box (neuronetz@cloud.neuronetz.ai:/home/neuronetz/docker-public/
neuro-psyc by default — overridable via env vars), clones-or-pulls,
ensures the external 'backend' docker network exists, runs docker
compose up -d --build (+ --profile gpu if PSYC_PROD_GPU=1), and then
verifies the cockpit is healthy both on prod-internal :8767 and at the
public URL — so the script ends knowing whether the page is up.

Refuses to touch prod's .env (warns + copies .env.example if missing,
so you can edit it manually). Never transfers data/ or adapters
(gitignored; prod fetches its own corpus). Color output, idempotent,
safe to re-run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:51:47 +02:00
m17hr1l
2a9c0bf34a stage-6: model inference server
scripts/serve_model.py — FastAPI in the CUDA container, loads base Qwen3.5-4B
+ a psyc adapter once and serves POST /infer. Lets the cockpit (no torch in
its venv) put a real fine-tuned model behind a Worker Mesh bot over HTTP.
Dockerfile.train gains a fastapi + uvicorn layer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:05:16 +02:00
m17hr1l
c6655853ac stage-3d: cockpit /train page — datasets + adapters + training metadata
New /train route lists built JSONL datasets (examples, size) and trained
adapters with their base model, hyperparameters, dataset provenance, and
loss history. train_qlora.py now records train_loss + per-step loss_history
into training_meta.json so future runs surface a loss curve in the cockpit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 15:16:46 +02:00
m17hr1l
b95e3e02bd stage-3c: working QLoRA training + eval — pytorch base, Qwen3.5 slug, SFTConfig
Training and eval now run clean on the unsloth 2026.5.2 / transformers v5 /
torch 2.10 stack. Fixes: pytorch/pytorch base image (sidesteps the nvidia/cuda
apt-signature failure and the torch download), correct base-model slug
unsloth/Qwen3.5-4B, TRL SFTConfig API. Adds scripts/eval_adapter.py — runs
dataset rows through base+adapter with structured (transformers-v5) message
content and Qwen3.5 thinking-mode stripping.

First v1 adapter: loss 2.10 -> 0.32 over 3 epochs. Eval surfaced an ill-posed
ioc_extraction dataset (output URL not present in input) — to be fixed in the
ExampleBuilder before the next training run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 14:16:22 +02:00
m17hr1l
f1ab11f89d stage-3c: unsloth QLoRA training scaffold for Qwen3.5
Dockerfile.train builds a CUDA 12.4 + unsloth container that consumes the
Trainline JSONL datasets and emits a LoRA adapter at data/adapters/<run>/final.
Defaults target a 24 GB GPU (Qwen3.5-4B-Instruct-bnb-4bit, r=16, bf16, 3 epochs,
effective batch 8). README documents the build + run workflow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 14:17:14 +02:00