m17hr1l/psyc - psyc - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
m17hr1l	77e4cb6ab9	deploy-all: redirect deploy.sh stdin from /dev/null so loop doesn't drop hosts	2026-06-07 02:06:45 +02:00
m17hr1l	2c7f71eff8	deploy: scripts/deploy-all.sh + hosts.example for multi-node federation rollouts	2026-06-07 01:03:31 +02:00
m17hr1l	9c3447723a	stage-28 fix: deploy.sh — auto-trust Gitea host (TOFU), never touch identity keys Reinstating the auto known_hosts entry on first deploy. Clear scope: host trust (TOFU known_hosts entry) is automated — same as 'ssh -o StrictHostKeyChecking=accept-new' would do; identity keypairs (~/.ssh/id_*) are never generated/copied/modified by deploy.sh. PSYC_SKIP_HOST_TRUST=1 disables the auto-trust step if you'd rather verify fingerprints manually. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 16:36:18 +02:00
m17hr1l	9edd56e28b	stage-28 fix: deploy.sh — read-only SSH preflight, no key/known_hosts edits User asked the script not to touch their SSH config. Reverted the auto-ssh-keyscan; the script now only READS ~/.ssh/known_hosts (via ssh-keygen -F) and, when the entry is missing, exits with explicit manual instructions for verifying the host key and registering an identity key in Gitea. Identical behavior on the happy path; clearer diagnostics on the unhappy path; zero modification of ~/.ssh anywhere. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:39:06 +02:00
m17hr1l	2c2ead6149	stage-28 fix: deploy.sh pre-trusts the Gitea SSH host key (first-clone) A fresh prod box has never SSH'd to gitea.neuronetz.ai before, so the first 'git clone' failed with 'Host key verification failed'. The script now parses the git remote URL to extract host+port, and on the prod box does an ssh-keyscan into ~/.ssh/known_hosts before the clone when the entry is missing. TOFU — if you want to verify the fingerprint out-of-band, pre-populate known_hosts manually and the script will see the entry and skip the scan. Also: if the clone still fails after the host key is trusted (likely a missing SSH key on Gitea side), the script now prints a clear hint pointing at where to register it. Supports both ssh://user@host:port/ and user@host: URL forms. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:32:44 +02:00
m17hr1l	61b7b8ef20	stage-28: deploy.sh — idempotent remote deploy + health probe scripts/deploy.sh pushes the current branch to origin, ssh's into the prod box (neuronetz@cloud.neuronetz.ai:/home/neuronetz/docker-public/ neuro-psyc by default — overridable via env vars), clones-or-pulls, ensures the external 'backend' docker network exists, runs docker compose up -d --build (+ --profile gpu if PSYC_PROD_GPU=1), and then verifies the cockpit is healthy both on prod-internal :8767 and at the public URL — so the script ends knowing whether the page is up. Refuses to touch prod's .env (warns + copies .env.example if missing, so you can edit it manually). Never transfers data/ or adapters (gitignored; prod fetches its own corpus). Color output, idempotent, safe to re-run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 14:51:47 +02:00
m17hr1l	2a9c0bf34a	stage-6: model inference server scripts/serve_model.py — FastAPI in the CUDA container, loads base Qwen3.5-4B + a psyc adapter once and serves POST /infer. Lets the cockpit (no torch in its venv) put a real fine-tuned model behind a Worker Mesh bot over HTTP. Dockerfile.train gains a fastapi + uvicorn layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 21:05:16 +02:00
m17hr1l	c6655853ac	stage-3d: cockpit /train page — datasets + adapters + training metadata New /train route lists built JSONL datasets (examples, size) and trained adapters with their base model, hyperparameters, dataset provenance, and loss history. train_qlora.py now records train_loss + per-step loss_history into training_meta.json so future runs surface a loss curve in the cockpit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 15:16:46 +02:00
m17hr1l	b95e3e02bd	stage-3c: working QLoRA training + eval — pytorch base, Qwen3.5 slug, SFTConfig Training and eval now run clean on the unsloth 2026.5.2 / transformers v5 / torch 2.10 stack. Fixes: pytorch/pytorch base image (sidesteps the nvidia/cuda apt-signature failure and the torch download), correct base-model slug unsloth/Qwen3.5-4B, TRL SFTConfig API. Adds scripts/eval_adapter.py — runs dataset rows through base+adapter with structured (transformers-v5) message content and Qwen3.5 thinking-mode stripping. First v1 adapter: loss 2.10 -> 0.32 over 3 epochs. Eval surfaced an ill-posed ioc_extraction dataset (output URL not present in input) — to be fixed in the ExampleBuilder before the next training run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 14:16:22 +02:00
m17hr1l	f1ab11f89d	stage-3c: unsloth QLoRA training scaffold for Qwen3.5 Dockerfile.train builds a CUDA 12.4 + unsloth container that consumes the Trainline JSONL datasets and emits a LoRA adapter at data/adapters/<run>/final. Defaults target a 24 GB GPU (Qwen3.5-4B-Instruct-bnb-4bit, r=16, bf16, 3 epochs, effective batch 8). README documents the build + run workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 14:17:14 +02:00

10 Commits