m17hr1l/psyc - psyc - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
m17hr1l	2a9c0bf34a	stage-6: model inference server scripts/serve_model.py — FastAPI in the CUDA container, loads base Qwen3.5-4B + a psyc adapter once and serves POST /infer. Lets the cockpit (no torch in its venv) put a real fine-tuned model behind a Worker Mesh bot over HTTP. Dockerfile.train gains a fastapi + uvicorn layer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 21:05:16 +02:00
m17hr1l	72d80dfd60	stage-6: psyc-v4 — well-posed severity input (source-agnostic status) _ex_severity_classification copied only URLhaus's `url_status` into the task input, so Feodo botnet cases lost the online/offline signal their label depends on — v3 severity eval stuck at 7/8 with one unlearnable example. The input now carries a normalized `status` (url_status or status), matching the field classify.py already uses for the label. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 21:02:32 +02:00
m17hr1l	838d90ffcb	stage-5: Worker Mesh — animated character-bot agents Replaces the passive journey timeline with an active worker mesh: seven robot agents (Scout, Classifier, Mapper, Sealer, Router, Courier, Ledger), each with a geometric SVG body, glowing antenna + reactor core in its own accent colour, expressive awake/asleep faces, and an idle float. A case token travels the conduit; as it reaches each bot the bot wakes (activation ring + work-flash), performs its action, and speaks its real answer in a speech bubble. Asleep bots are steps that did not occur for this case. Replay button re-runs it. Every answer is real persisted data — the bots animate, they do not fake. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 20:41:45 +02:00
m17hr1l	9bd5a30495	stage-5: Case Journey — animated 7-beat pipeline replay New /cases/{id}/journey view tells a case's story as it moved through psyc: Detected → Classified → Located → Sealed → Routed → Submitted → Recorded. Each beat is reconstructed from real persisted state (classification, sealed package, planned routes, ledger rows) — a replay of recorded events, not a script; beats that did not happen render as "pending". CSS-staggered reveal with pulsing timeline nodes, on-brand cyan/navy, replay button. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 19:53:25 +02:00
m17hr1l	afba077f6f	stage-4: ioc_extraction includes CVE-only cases The ExampleBuilder guard checked urls/domains/ips/hashes but not cves, so CISA KEV cases (CVE is their only observable) were silently dropped from the ioc_extraction dataset. Now they produce CVE-extraction examples. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 23:45:55 +02:00
m17hr1l	2138611fdb	stage-4: multi-source Scoutline — CISA KEV + Feodo Tracker Scoutline is now a source registry: urlhaus, cisa-kev, feodo. CISA KEV brings exploit/CVE cases, Feodo Tracker brings botnet C2 cases — real incident-type variety beyond URLhaus's malware monotone. Classifyline is source-aware (feed tag → incident type; ransomware-flagged KEV → critical). CLI gains fetch-cisa-kev, fetch-feodo, fetch-all. Both new feeds are keyless public download feeds (verified). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 23:42:13 +02:00
m17hr1l	b4c66c2e87	stage-3e: well-posed ioc_extraction dataset + clearer /train page ioc_extraction ExampleBuilder now embeds every IOC into the advisory text so the extraction task is answerable from the input (v1 asked the model to "extract" a URL that was never given). /train page distinguishes trained / training… / not-started, and renders a per-step loss bar chart. Dockerfile no longer bakes the training script — scripts/ is mounted at run time so edits take effect without a 21 GB rebuild (this is why psyc-v2's loss capture was silently skipped on its first run). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:09:37 +02:00
m17hr1l	c6655853ac	stage-3d: cockpit /train page — datasets + adapters + training metadata New /train route lists built JSONL datasets (examples, size) and trained adapters with their base model, hyperparameters, dataset provenance, and loss history. train_qlora.py now records train_loss + per-step loss_history into training_meta.json so future runs surface a loss curve in the cockpit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 15:16:46 +02:00
m17hr1l	b95e3e02bd	stage-3c: working QLoRA training + eval — pytorch base, Qwen3.5 slug, SFTConfig Training and eval now run clean on the unsloth 2026.5.2 / transformers v5 / torch 2.10 stack. Fixes: pytorch/pytorch base image (sidesteps the nvidia/cuda apt-signature failure and the torch download), correct base-model slug unsloth/Qwen3.5-4B, TRL SFTConfig API. Adds scripts/eval_adapter.py — runs dataset rows through base+adapter with structured (transformers-v5) message content and Qwen3.5 thinking-mode stripping. First v1 adapter: loss 2.10 -> 0.32 over 3 epochs. Eval surfaced an ill-posed ioc_extraction dataset (output URL not present in input) — to be fixed in the ExampleBuilder before the next training run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 14:16:22 +02:00
m17hr1l	f1ab11f89d	stage-3c: unsloth QLoRA training scaffold for Qwen3.5 Dockerfile.train builds a CUDA 12.4 + unsloth container that consumes the Trainline JSONL datasets and emits a LoRA adapter at data/adapters/<run>/final. Defaults target a 24 GB GPU (Qwen3.5-4B-Instruct-bnb-4bit, r=16, bf16, 3 epochs, effective batch 8). README documents the build + run workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 14:17:14 +02:00
m17hr1l	b8ea4ead02	stage-3b: Trainline — JSONL dataset pipeline for QLoRA training ExampleBuilder emits Alpaca-style training rows for four defensive tasks (ioc_extraction, severity_classification, routing_decision, tlp_assignment). QualityGate enforces the dossier's training-data policy: drops TLP:RED, restricted-source, empty, oversize, and credential-leak examples. DatasetWriter versions outputs as data/datasets/<task>-v<n>.jsonl. CLI: train-build, train-build-all, train-list-datasets. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 14:15:58 +02:00
m17hr1l	da4792c179	stage-3a: Mapline GeoResolver — host IP → country via ip-api.com Cases now carry a resolved hosting country, which feeds the country-scoped destination policy. CN-hosted URLhaus malware correctly stays gated off CERT-Bund (only DE) while still firing MISP-Community + URLhaus. psyc demo runs the map step between classify and seal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 14:13:31 +02:00
m17hr1l	3f18e5aa8e	stage-2: full pipeline — Classifyline → Sealine → Routeline → Courier → Ledger + mock CERT Adds the end-to-end demo chain. PyNaCl sealed boxes implement the dossier's Model A authority public-key encryption; SQLAlchemy ledger records every submission and every policy-blocked route. Cockpit gains /ledger and an enriched case detail (sealed-package card, routes panel, per-case audit). Mock CERT FastAPI app on :8770 stands in for the real authority endpoints. `psyc demo` runs the whole chain on a fresh URLhaus row. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 13:44:43 +02:00
m17hr1l	e04c6c96d8	init: scaffold psyc — defensive CTI routing & evidence-sealing platform Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 12:43:47 +02:00

14 Commits