stage-7: demo polish — mesh-aware demo command, current README, run-sheet

psyc demo now closes with cockpit links pointing at the Worker Mesh and reports whether the live model server is up. README rewritten to current state — Worker Mesh, inference server, model-in-operation, the three services, accurate code layout. Adds docs/demo.md, a one-page run-sheet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:48:57 +02:00
parent 67f26f271e
commit f1449af45b
3 changed files with 199 additions and 114 deletions
--- a/README.md
+++ b/README.md
@@ -7,8 +7,11 @@
 > Validate the signal, protect the evidence, route only what each destination is
 > authorized to receive, and prove every external action through an immutable ledger.

-Defensive cyber-threat-intelligence routing & evidence-sealing platform.
-Built as a 48h hackathon project on 2026-05-13. Active development.
+Defensive cyber-threat-intelligence routing & evidence-sealing platform — a
+small-worker mesh that ingests public threat feeds, classifies and seals cases,
+routes them to the right destinations under TLP policy, and proves every action
+through an append-only ledger. Started as a 48h hackathon (2026-05); grown into
+a working platform with a fine-tuned model in operation.

 ---

@@ -16,25 +19,25 @@ Built as a 48h hackathon project on 2026-05-13. Active development.

 ```text
 Sensors
-→ Scoutline      fetch, parse, dedup, signal
-→ Proofline      validate indicators, score confidence
-→ Mapline        resolve victim, actor, jurisdiction, CERT route
-→ Classifyline   severity, TLP, incident type, internal class
-→ Sealine        authority-sealed evidence encryption
-→ Routeline      pick destinations, build payloads, submit
-→ Ledgerline     immutable audit, receipts, outcomes
-→ Publishline    sanitized public intelligence after mitigation
-→ Trainline      lawful intel → LoRA-ready training data
-→ Cockpit        operator UI (FastAPI + Jinja)
+→ Scoutline      fetch + parse public feeds, emit normalized cases   [built]
+→ Proofline      validate indicators, score confidence               [planned]
+→ Mapline        resolve hosting country / jurisdiction              [built]
+→ Classifyline   severity, TLP, incident type, internal class        [built]
+→ Sealine        authority-sealed evidence encryption                [built]
+→ Routeline      pick destinations under policy, build payloads      [built]
+→ Courier        submit to destinations, collect receipts            [built]
+→ Ledgerline     immutable audit of every submission + blocked route [built]
+→ Publishline    sanitized public intelligence after mitigation      [planned]
+→ Trainline      lawful intel → LoRA datasets + QLoRA training       [built]
+→ Cockpit        operator UI (FastAPI + Jinja)                       [built]
 ```

-Each `-line` is a stage in a small-worker mesh; each worker performs one
-narrow job and passes a normalized `Case` object to the next stage. Heavy
-models are reserved for judgment-heavy tasks. Humans approve everything
-sensitive before it leaves the platform.
+Each `-line` is a stage in a small-worker mesh; each worker does one narrow job
+and passes a normalized `Case` object onward. Rules drive the deterministic
+work; a fine-tuned model handles judgment (see Training). Humans approve
+anything sensitive before it leaves the platform.

-Full architecture: [`docs/dossier.md`](docs/dossier.md) — consolidated read of
-the original individual records (still in [`docs/archive/`](docs/archive/)).
+Full design: [`docs/dossier.md`](docs/dossier.md) · style: [`docs/style.md`](docs/style.md) · demo run-sheet: [`docs/demo.md`](docs/demo.md)

 ---

@@ -44,127 +47,136 @@ the original individual records (still in [`docs/archive/`](docs/archive/)).
 python3 -m virtualenv .venv
 .venv/bin/pip install -e .

-.venv/bin/psyc init                       # create the sqlite db
-.venv/bin/psyc fetch-all                  # ingest URLhaus + CISA KEV + Feodo Tracker
-.venv/bin/psyc serve --port 8767          # cockpit at http://127.0.0.1:8767
-.venv/bin/psyc status                     # count of ingested cases
+.venv/bin/psyc init               # create the sqlite db
+.venv/bin/psyc fetch-all          # ingest URLhaus + CISA KEV + Feodo Tracker
+.venv/bin/psyc demo               # run one case through the whole pipeline
 ```

+The platform runs as up to three services (each in its own terminal):
+
+```bash
+.venv/bin/psyc serve --port 8767      # operator cockpit  → http://127.0.0.1:8767
+.venv/bin/psyc mock-cert --port 8770  # stand-in CERT / abuse-API receiver
+
+# optional, needs an NVIDIA GPU — puts the live model behind the Classifier bot:
+docker run --gpus all --rm -p 8771:8771 --entrypoint python \
+    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
+    psyc-trainer /scripts/serve_model.py --adapter /data/adapters/psyc-v4/final
+```
+
+---
+
+## Cockpit
+
+`http://127.0.0.1:8767` — five views:
+
+| View | Path | Shows |
+|---|---|---|
+| Case Queue | `/cases` | every ingested case, severity + TLP badges |
+| Case detail | `/cases/{id}` | classification, observables, sealed package, routes, per-case ledger |
+| Worker Mesh | `/cases/{id}/journey` | animated 7-bot replay of the case's path; the Classifier bot shows the live model's verdict |
+| Ledger | `/ledger` | immutable audit feed |
+| Trainline | `/train` | datasets + trained adapters with loss charts |
+
 ---

 ## Code layout

 ```
 src/psyc/
-  models.py          # normalized Case object (Pydantic)
-  db.py              # SQLAlchemy Core; cases + ledger tables
-  result.py          # Ok / Err / Result[T, E]
-  log.py             # structlog configuration
-  cli.py             # flat Typer commands
-  lines/             # one file per worker line
-    scout.py         # Fetcher + Signalizer (URLhaus today)
-  cockpit/           # FastAPI + Jinja operator UI
-    app.py
-    templates/
-    static/
+  models.py        normalized Case object + enums (Pydantic)
+  db.py            SQLAlchemy Core — cases + ledger tables
+  result.py        Ok / Err / Result[T, E]
+  log.py           structlog configuration
+  cli.py           flat Typer CLI
+  mock_cert.py     stand-in CERT / abuse-API receiver
+  lines/           one file per worker line
+    scout.py       multi-source fetch + signalize (URLhaus, CISA KEV, Feodo)
+    classify.py    severity / TLP / incident type / internal class
+    map.py         GeoResolver — host IP → country
+    seal.py        PyNaCl sealed-box evidence encryption
+    route.py       destination matrix + policy gates
+    courier.py     HTTP submission + payload building
+    ledger.py      append-only audit
+    train.py       JSONL dataset builders + quality gate
+  cockpit/         FastAPI + Jinja operator UI
+    app.py         routes
+    journey.py     Worker Mesh / case-journey assembly
+    inference.py   client for the live model server
+    templates/  static/
+
+scripts/
+  train_qlora.py   unsloth QLoRA fine-tune
+  eval_adapter.py  adapter evaluation
+  serve_model.py   inference server (FastAPI, runs in the CUDA container)

 docs/
-  dossier.md         # full architecture (consolidated)
-  style.md           # 12-fold Python style guide
-  archive/           # original architecture docs + logo variants
+  dossier.md  style.md  demo.md  archive/
 ```

 ---

+## Training & the live model (Trainline + QLoRA)
+
+`psyc train-build-all` emits Alpaca-style JSONL datasets under
+`data/datasets/<task>-v<n>.jsonl` for four defensive tasks — `ioc_extraction`,
+`severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate
+drops TLP:RED, restricted-source, empty, and credential-leak rows.
+
+Fine-tune Qwen3.5-4B with QLoRA in the CUDA container:
+
+```bash
+docker build -t psyc-trainer -f Dockerfile.train .
+
+docker run --gpus all --rm --entrypoint python \
+    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
+    psyc-trainer /scripts/train_qlora.py \
+    --dataset /data/datasets/ioc_extraction-v4.jsonl \
+    --dataset /data/datasets/severity_classification-v4.jsonl \
+    --dataset /data/datasets/routing_decision-v4.jsonl \
+    --dataset /data/datasets/tlp_assignment-v4.jsonl \
+    --output /data/adapters/psyc-v4
+```
+
+Defaults target a 24 GB GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit, LoRA
+r=16, bf16, 3 epochs. Output: `data/adapters/<run>/final/` + `training_meta.json`.
+Evaluate with `scripts/eval_adapter.py`; the `/train` cockpit page shows every
+dataset and adapter with its loss curve.
+
+`scripts/serve_model.py` loads an adapter and serves `/infer` over HTTP. When
+it's running, the cockpit's **Classifier bot** shows the live model's severity
+verdict beside the rule's — and degrades to rules-only if the server is down.
+
+---
+
 ## Style

-All code follows [`docs/style.md`](docs/style.md): `Optional[X]` / `List[X]`
-from `typing`, `Field(default_factory=...)` for Pydantic mutables, `Result[T, E]`
-types for expected failures (`raise` reserved for true exceptions), `class X(str, Enum)`
-for closed string sets, structlog with `area.action` event names, SQLAlchemy Core
-(no ORM), flat Typer commands with hyphenated names. Ruff config in `pyproject.toml`
-enforces the bits a linter can check; `UP006`/`UP007`/`UP035` are disabled so the
-typing-import rules stand.
+All code follows [`docs/style.md`](docs/style.md) — a 12-fold guide: `Optional[X]`
+/ `List[X]` from `typing`, `Field(default_factory=...)`, `Result[T, E]` for
+expected failures, `class X(str, Enum)`, structlog `area.action` events,
+SQLAlchemy Core (no ORM), flat hyphenated Typer commands.

 ---

 ## Scope

 **Lawful, white-hat defensive operations only.** psyc routes intelligence to
-victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and
-trusted CTI communities. It will **not**:
-
- amplify stolen data
- expose victims prematurely
- interact with criminal actors
- distribute exploitation content
- submit evidence that exceeds a destination's max TLP
-
-The boundaries are defined in `docs/dossier.md` §5 *Destination Minimization*,
-§10 *TLP Enforcement*, and §16 *Public Reporting Rules*. The Ledger records
-every external submission and destructive action; sensitive evidence is
-encrypted to authorized recipients via Sealine before any routing decision.
+victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and trusted
+CTI communities. It will **not** amplify stolen data, expose victims
+prematurely, interact with criminal actors, distribute exploitation content, or
+submit evidence beyond a destination's max TLP. Boundaries: `docs/dossier.md`
+§5, §10, §16.

 ---

-## Training (Trainline + QLoRA)
-
-`psyc train-build-all` emits Alpaca-style JSONL datasets under
-`data/datasets/<task>-v<n>.jsonl` for four defensive tasks: `ioc_extraction`,
-`severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate
-drops `TLP:RED`, restricted sources, empty/oversize, and credential-leak rows
-per the dossier's training-data policy.
-
-To fine-tune Qwen3.5-4B with QLoRA in an NVIDIA Docker container:
-
-```bash
-# 1. build datasets (one-off; re-run after ingestion changes)
-.venv/bin/psyc train-build-all
-
-# 2. build the training image (pytorch 2.6/CUDA 12.4 base + unsloth + Qwen3.5)
-docker build -t psyc-trainer -f Dockerfile.train .
-
-# 3. fine-tune — scripts/ + data/ are mounted, so script edits need no rebuild
-docker run --gpus all --rm --entrypoint python \
-    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
-    psyc-trainer /scripts/train_qlora.py \
-    --dataset /data/datasets/ioc_extraction-v2.jsonl \
-    --dataset /data/datasets/severity_classification-v2.jsonl \
-    --dataset /data/datasets/routing_decision-v2.jsonl \
-    --dataset /data/datasets/tlp_assignment-v2.jsonl \
-    --output /data/adapters/psyc-v2
-```
-
-Defaults target a 24 GB consumer GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit,
-LoRA `r=16`/`alpha=16`, bf16, 3 epochs, effective batch size 8. For A100-40/80
-bump `--base-model unsloth/Qwen3.5-9B` and raise `--batch-size` +
-`--max-seq-length`.
-
-Output: `data/adapters/psyc-v1/final/` (adapter weights) + `training_meta.json`
-(base model, hyperparameters, dataset list).
-
-Evaluate the adapter against held-out dataset rows:
-
-```bash
-docker run --gpus all --rm \
-    --entrypoint python \
-    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
-    psyc-trainer /scripts/eval_adapter.py \
-    --adapter /data/adapters/psyc-v2/final \
-    --dataset /data/datasets/ioc_extraction-v2.jsonl --n 5
-```
-
-The cockpit `/train` page lists every built dataset and trained adapter with
-its base model, hyperparameters, dataset provenance, and a per-step loss chart.
-
 ## Status

-Day 2 of a 48h build. Shipped: Scoutline (URLhaus) → Classifyline → Mapline
-(GeoResolver via ip-api.com) → Sealine (PyNaCl sealed boxes) → Routeline →
-Courier → mock CERT → Ledgerline. Cockpit has cases / case detail / ledger
-pages and a design-token CSS layer. Trainline emits LoRA-ready JSONL;
-`Dockerfile.train` builds an unsloth + Qwen3.5 QLoRA training container.
+Working platform. Built: Scoutline (URLhaus + CISA KEV + Feodo Tracker) →
+Classifyline → Mapline → Sealine → Routeline → Courier → Ledgerline → Trainline,
+the FastAPI cockpit (five views incl. the animated Worker Mesh), and a
+fine-tuned Qwen3.5-4B (psyc-v4) served live behind the Classifier bot.
+Not yet built: Proofline (confidence scoring), Publishline (public advisories).

 ## License

-Unset for the hackathon. Choose before any external release.
+Unset. Choose before any external release.
--- a/docs/demo.md
+++ b/docs/demo.md
@@ -0,0 +1,65 @@
+# psyc — demo run-sheet
+
+A ~5-minute walk-through of the platform; ~10 min including setup.
+
+## 0. Setup (once)
+
+```bash
+python3 -m virtualenv .venv
+.venv/bin/pip install -e .
+.venv/bin/psyc init
+```
+
+## 1. Start the services
+
+Separate terminals — the third is optional and needs an NVIDIA GPU:
+
+```bash
+# terminal 1 — operator cockpit
+.venv/bin/psyc serve --port 8767
+
+# terminal 2 — stand-in CERT / abuse-API receiver
+.venv/bin/psyc mock-cert --port 8770
+
+# terminal 3 — live model behind the Classifier bot (optional)
+docker run --gpus all --rm -p 8771:8771 --entrypoint python \
+    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
+    psyc-trainer /scripts/serve_model.py --adapter /data/adapters/psyc-v4/final
+```
+
+## 2. Run the pipeline
+
+```bash
+.venv/bin/psyc fetch-all     # ingest URLhaus + CISA KEV + Feodo Tracker
+.venv/bin/psyc demo          # one case end-to-end; prints the cockpit links
+```
+
+## 3. The walk-through
+
+1. **Case Queue** — http://127.0.0.1:8767/cases
+   30+ cases across three feeds, with severity + TLP badges. *"Three sources,
+   one normalized case object."*
+
+2. **Worker Mesh** — open the journey link `psyc demo` printed. This is the
+   centerpiece: seven robot agents, a case token flowing through, each bot
+   waking to perform its action and speak its real answer. Hit **▶ replay**.
+   - **Classifier bot** carries a live verdict from the fine-tuned psyc-v4
+     model — green when the model agrees with the rule, amber when it differs.
+   - **Sealer** — evidence encrypted to authority public keys (PyNaCl sealed box).
+   - **Router** — destinations cleared vs. policy-blocked (TLP ceiling, country).
+
+3. **Ledger** — http://127.0.0.1:8767/ledger
+   Every submission and every blocked route, immutably recorded.
+
+4. **Trainline** — http://127.0.0.1:8767/train
+   The four task datasets and the trained adapters with their loss curves.
+
+## Talking points
+
+- **Defensive only** — psyc never amplifies stolen data or contacts criminal
+  actors; routing is gated by TLP, jurisdiction, and incident type.
+- **Rules + model** — deterministic work is rule-based; the fine-tuned model
+  handles judgment. One bot is genuinely a live model, not animation over rules.
+- **Honest about limits** — psyc-v4 evals 7/8 on severity; the one miss is a
+  documented data-scarcity case (one online-botnet example), not a bug, and was
+  not gamed away.
--- a/src/psyc/cli.py
+++ b/src/psyc/cli.py
@@ -8,6 +8,7 @@ import typer
 import uvicorn

 from psyc import db, log
+from psyc.cockpit import inference
 from psyc.lines import classify, courier, route, scout, seal, train
 from psyc.lines import map as map_line
 from psyc.models import Outcome
@@ -315,8 +316,15 @@ def demo() -> None:
    for b in blocked:
        typer.echo(f"      ⊘ {b.destination_name}: {b.reason}")
    typer.echo("")
-    typer.echo(f"inspect: http://127.0.0.1:8767/cases/{case.case_id}")
-    typer.echo(f"ledger:  http://127.0.0.1:8767/ledger")
+    typer.echo("── see it in the cockpit ──")
+    typer.echo(f"  Worker Mesh:  http://127.0.0.1:8767/cases/{case.case_id}/journey")
+    typer.echo(f"  Case detail:  http://127.0.0.1:8767/cases/{case.case_id}")
+    typer.echo(f"  Ledger:       http://127.0.0.1:8767/ledger")
+    adapter = inference.server_adapter()
+    if adapter:
+        typer.echo(f"  Live model:   up ({adapter}) — the Classifier bot shows its verdict")
+    else:
+        typer.echo("  Live model:   inference server offline — Classifier bot falls back to rules")


@app.command("serve")