Reinstating the auto known_hosts entry on first deploy. Clear scope:
host trust (TOFU known_hosts entry) is automated — same as
'ssh -o StrictHostKeyChecking=accept-new' would do; identity keypairs
(~/.ssh/id_*) are never generated/copied/modified by deploy.sh.
PSYC_SKIP_HOST_TRUST=1 disables the auto-trust step if you'd rather
verify fingerprints manually.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
User asked the script not to touch their SSH config. Reverted the
auto-ssh-keyscan; the script now only READS ~/.ssh/known_hosts (via
ssh-keygen -F) and, when the entry is missing, exits with explicit
manual instructions for verifying the host key and registering an
identity key in Gitea. Identical behavior on the happy path; clearer
diagnostics on the unhappy path; zero modification of ~/.ssh anywhere.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A fresh prod box has never SSH'd to gitea.neuronetz.ai before, so the
first 'git clone' failed with 'Host key verification failed'. The
script now parses the git remote URL to extract host+port, and on the
prod box does an ssh-keyscan into ~/.ssh/known_hosts before the clone
when the entry is missing. TOFU — if you want to verify the fingerprint
out-of-band, pre-populate known_hosts manually and the script will see
the entry and skip the scan.
Also: if the clone still fails after the host key is trusted (likely a
missing SSH key on Gitea side), the script now prints a clear hint
pointing at where to register it. Supports both ssh://user@host:port/
and user@host: URL forms.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/deploy.sh pushes the current branch to origin, ssh's into the
prod box (neuronetz@cloud.neuronetz.ai:/home/neuronetz/docker-public/
neuro-psyc by default — overridable via env vars), clones-or-pulls,
ensures the external 'backend' docker network exists, runs docker
compose up -d --build (+ --profile gpu if PSYC_PROD_GPU=1), and then
verifies the cockpit is healthy both on prod-internal :8767 and at the
public URL — so the script ends knowing whether the page is up.
Refuses to touch prod's .env (warns + copies .env.example if missing,
so you can edit it manually). Never transfers data/ or adapters
(gitignored; prod fetches its own corpus). Color output, idempotent,
safe to re-run.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scripts/serve_model.py — FastAPI in the CUDA container, loads base Qwen3.5-4B
+ a psyc adapter once and serves POST /infer. Lets the cockpit (no torch in
its venv) put a real fine-tuned model behind a Worker Mesh bot over HTTP.
Dockerfile.train gains a fastapi + uvicorn layer.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New /train route lists built JSONL datasets (examples, size) and trained
adapters with their base model, hyperparameters, dataset provenance, and
loss history. train_qlora.py now records train_loss + per-step loss_history
into training_meta.json so future runs surface a loss curve in the cockpit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Training and eval now run clean on the unsloth 2026.5.2 / transformers v5 /
torch 2.10 stack. Fixes: pytorch/pytorch base image (sidesteps the nvidia/cuda
apt-signature failure and the torch download), correct base-model slug
unsloth/Qwen3.5-4B, TRL SFTConfig API. Adds scripts/eval_adapter.py — runs
dataset rows through base+adapter with structured (transformers-v5) message
content and Qwen3.5 thinking-mode stripping.
First v1 adapter: loss 2.10 -> 0.32 over 3 epochs. Eval surfaced an ill-posed
ioc_extraction dataset (output URL not present in input) — to be fixed in the
ExampleBuilder before the next training run.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Dockerfile.train builds a CUDA 12.4 + unsloth container that consumes the
Trainline JSONL datasets and emits a LoRA adapter at data/adapters/<run>/final.
Defaults target a 24 GB GPU (Qwen3.5-4B-Instruct-bnb-4bit, r=16, bf16, 3 epochs,
effective batch 8). README documents the build + run workflow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>