stage-7: demo polish — mesh-aware demo command, current README, run-sheet

psyc demo now closes with cockpit links pointing at the Worker Mesh and reports whether the live model server is up. README rewritten to current state — Worker Mesh, inference server, model-in-operation, the three services, accurate code layout. Adds docs/demo.md, a one-page run-sheet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 21:48:57 +02:00
parent 67f26f271e
commit f1449af45b
3 changed files with 199 additions and 114 deletions
--- a/README.md
+++ b/README.md
@@ -7,8 +7,11 @@
 > Validate the signal, protect the evidence, route only what each destination is
 > authorized to receive, and prove every external action through an immutable ledger.
-Defensive cyber-threat-intelligence routing & evidence-sealing platform.
+Defensive cyber-threat-intelligence routing & evidence-sealing platform — a
-Built as a 48h hackathon project on 2026-05-13. Active development.
+small-worker mesh that ingests public threat feeds, classifies and seals cases,
 routes them to the right destinations under TLP policy, and proves every action
 through an append-only ledger. Started as a 48h hackathon (2026-05); grown into
 a working platform with a fine-tuned model in operation.
 ---
@@ -16,25 +19,25 @@ Built as a 48h hackathon project on 2026-05-13. Active development.
 ```text
 Sensors
-→ Scoutline      fetch, parse, dedup, signal
+→ Scoutline      fetch + parse public feeds, emit normalized cases   [built]
-→ Proofline      validate indicators, score confidence
+→ Proofline      validate indicators, score confidence               [planned]
-→ Mapline        resolve victim, actor, jurisdiction, CERT route
+→ Mapline        resolve hosting country / jurisdiction              [built]
-→ Classifyline   severity, TLP, incident type, internal class
+→ Classifyline   severity, TLP, incident type, internal class        [built]
-→ Sealine        authority-sealed evidence encryption
+→ Sealine        authority-sealed evidence encryption                [built]
-→ Routeline      pick destinations, build payloads, submit
+→ Routeline      pick destinations under policy, build payloads      [built]
-→ Ledgerline     immutable audit, receipts, outcomes
+→ Courier        submit to destinations, collect receipts            [built]
-→ Publishline    sanitized public intelligence after mitigation
+→ Ledgerline     immutable audit of every submission + blocked route [built]
-→ Trainline      lawful intel → LoRA-ready training data
+→ Publishline    sanitized public intelligence after mitigation      [planned]
-→ Cockpit        operator UI (FastAPI + Jinja)
+→ Trainline      lawful intel → LoRA datasets + QLoRA training       [built]
 → Cockpit        operator UI (FastAPI + Jinja)                       [built]
 ```
-Each `-line` is a stage in a small-worker mesh; each worker performs one
+Each `-line` is a stage in a small-worker mesh; each worker does one narrow job
-narrow job and passes a normalized `Case` object to the next stage. Heavy
+and passes a normalized `Case` object onward. Rules drive the deterministic
-models are reserved for judgment-heavy tasks. Humans approve everything
+work; a fine-tuned model handles judgment (see Training). Humans approve
-sensitive before it leaves the platform.
+anything sensitive before it leaves the platform.
-Full architecture: [`docs/dossier.md`](docs/dossier.md) — consolidated read of
+Full design: [`docs/dossier.md`](docs/dossier.md) · style: [`docs/style.md`](docs/style.md) · demo run-sheet: [`docs/demo.md`](docs/demo.md)
 the original individual records (still in [`docs/archive/`](docs/archive/)).
 ---
@@ -46,125 +49,134 @@ python3 -m virtualenv .venv
 .venv/bin/psyc init               # create the sqlite db
 .venv/bin/psyc fetch-all          # ingest URLhaus + CISA KEV + Feodo Tracker
-.venv/bin/psyc serve --port 8767          # cockpit at http://127.0.0.1:8767
+.venv/bin/psyc demo               # run one case through the whole pipeline
 .venv/bin/psyc status                     # count of ingested cases
 ```
 The platform runs as up to three services (each in its own terminal):
 ```bash
 .venv/bin/psyc serve --port 8767      # operator cockpit  → http://127.0.0.1:8767
 .venv/bin/psyc mock-cert --port 8770  # stand-in CERT / abuse-API receiver
 # optional, needs an NVIDIA GPU — puts the live model behind the Classifier bot:
 docker run --gpus all --rm -p 8771:8771 --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/serve_model.py --adapter /data/adapters/psyc-v4/final
 ```
 ---
 ## Cockpit
 `http://127.0.0.1:8767` — five views:
 | View | Path | Shows |
 |---|---|---|
 | Case Queue | `/cases` | every ingested case, severity + TLP badges |
 | Case detail | `/cases/{id}` | classification, observables, sealed package, routes, per-case ledger |
 | Worker Mesh | `/cases/{id}/journey` | animated 7-bot replay of the case's path; the Classifier bot shows the live model's verdict |
 | Ledger | `/ledger` | immutable audit feed |
 | Trainline | `/train` | datasets + trained adapters with loss charts |
 ---
 ## Code layout
 ```
 src/psyc/
-  models.py          # normalized Case object (Pydantic)
+  models.py        normalized Case object + enums (Pydantic)
-  db.py              # SQLAlchemy Core; cases + ledger tables
+  db.py            SQLAlchemy Core — cases + ledger tables
-  result.py          # Ok / Err / Result[T, E]
+  result.py        Ok / Err / Result[T, E]
-  log.py             # structlog configuration
+  log.py           structlog configuration
-  cli.py             # flat Typer commands
+  cli.py           flat Typer CLI
-  lines/             # one file per worker line
+  mock_cert.py     stand-in CERT / abuse-API receiver
-    scout.py         # Fetcher + Signalizer (URLhaus today)
+  lines/           one file per worker line
-  cockpit/           # FastAPI + Jinja operator UI
+    scout.py       multi-source fetch + signalize (URLhaus, CISA KEV, Feodo)
-    app.py
+    classify.py    severity / TLP / incident type / internal class
-    templates/
+    map.py         GeoResolver — host IP → country
-    static/
+    seal.py        PyNaCl sealed-box evidence encryption
    route.py       destination matrix + policy gates
    courier.py     HTTP submission + payload building
    ledger.py      append-only audit
    train.py       JSONL dataset builders + quality gate
  cockpit/         FastAPI + Jinja operator UI
    app.py         routes
    journey.py     Worker Mesh / case-journey assembly
    inference.py   client for the live model server
    templates/  static/
 scripts/
  train_qlora.py   unsloth QLoRA fine-tune
  eval_adapter.py  adapter evaluation
  serve_model.py   inference server (FastAPI, runs in the CUDA container)
 docs/
-  dossier.md         # full architecture (consolidated)
+  dossier.md  style.md  demo.md  archive/
  style.md           # 12-fold Python style guide
  archive/           # original architecture docs + logo variants
 ```
 ---
 ## Training & the live model (Trainline + QLoRA)
 `psyc train-build-all` emits Alpaca-style JSONL datasets under
 `data/datasets/<task>-v<n>.jsonl` for four defensive tasks — `ioc_extraction`,
 `severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate
 drops TLP:RED, restricted-source, empty, and credential-leak rows.
 Fine-tune Qwen3.5-4B with QLoRA in the CUDA container:
 ```bash
 docker build -t psyc-trainer -f Dockerfile.train .
 docker run --gpus all --rm --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/train_qlora.py \
    --dataset /data/datasets/ioc_extraction-v4.jsonl \
    --dataset /data/datasets/severity_classification-v4.jsonl \
    --dataset /data/datasets/routing_decision-v4.jsonl \
    --dataset /data/datasets/tlp_assignment-v4.jsonl \
    --output /data/adapters/psyc-v4
 ```
 Defaults target a 24 GB GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit, LoRA
 r=16, bf16, 3 epochs. Output: `data/adapters/<run>/final/` + `training_meta.json`.
 Evaluate with `scripts/eval_adapter.py`; the `/train` cockpit page shows every
 dataset and adapter with its loss curve.
 `scripts/serve_model.py` loads an adapter and serves `/infer` over HTTP. When
 it's running, the cockpit's **Classifier bot** shows the live model's severity
 verdict beside the rule's — and degrades to rules-only if the server is down.
 ---
 ## Style
-All code follows [`docs/style.md`](docs/style.md): `Optional[X]` / `List[X]`
+All code follows [`docs/style.md`](docs/style.md) — a 12-fold guide: `Optional[X]`
-from `typing`, `Field(default_factory=...)` for Pydantic mutables, `Result[T, E]`
+/ `List[X]` from `typing`, `Field(default_factory=...)`, `Result[T, E]` for
-types for expected failures (`raise` reserved for true exceptions), `class X(str, Enum)`
+expected failures, `class X(str, Enum)`, structlog `area.action` events,
-for closed string sets, structlog with `area.action` event names, SQLAlchemy Core
+SQLAlchemy Core (no ORM), flat hyphenated Typer commands.
 (no ORM), flat Typer commands with hyphenated names. Ruff config in `pyproject.toml`
 enforces the bits a linter can check; `UP006`/`UP007`/`UP035` are disabled so the
 typing-import rules stand.
 ---
 ## Scope
 **Lawful, white-hat defensive operations only.** psyc routes intelligence to
-victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and
+victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and trusted
-trusted CTI communities. It will **not**:
+CTI communities. It will **not** amplify stolen data, expose victims
-
+prematurely, interact with criminal actors, distribute exploitation content, or
- amplify stolen data
+submit evidence beyond a destination's max TLP. Boundaries: `docs/dossier.md`
- expose victims prematurely
+§5, §10, §16.
 - interact with criminal actors
 - distribute exploitation content
 - submit evidence that exceeds a destination's max TLP
 The boundaries are defined in `docs/dossier.md` §5 *Destination Minimization*,
 §10 *TLP Enforcement*, and §16 *Public Reporting Rules*. The Ledger records
 every external submission and destructive action; sensitive evidence is
 encrypted to authorized recipients via Sealine before any routing decision.
 ---
 ## Training (Trainline + QLoRA)
 `psyc train-build-all` emits Alpaca-style JSONL datasets under
 `data/datasets/<task>-v<n>.jsonl` for four defensive tasks: `ioc_extraction`,
 `severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate
 drops `TLP:RED`, restricted sources, empty/oversize, and credential-leak rows
 per the dossier's training-data policy.
 To fine-tune Qwen3.5-4B with QLoRA in an NVIDIA Docker container:
 ```bash
 # 1. build datasets (one-off; re-run after ingestion changes)
 .venv/bin/psyc train-build-all
 # 2. build the training image (pytorch 2.6/CUDA 12.4 base + unsloth + Qwen3.5)
 docker build -t psyc-trainer -f Dockerfile.train .
 # 3. fine-tune — scripts/ + data/ are mounted, so script edits need no rebuild
 docker run --gpus all --rm --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/train_qlora.py \
    --dataset /data/datasets/ioc_extraction-v2.jsonl \
    --dataset /data/datasets/severity_classification-v2.jsonl \
    --dataset /data/datasets/routing_decision-v2.jsonl \
    --dataset /data/datasets/tlp_assignment-v2.jsonl \
    --output /data/adapters/psyc-v2
 ```
 Defaults target a 24 GB consumer GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit,
 LoRA `r=16`/`alpha=16`, bf16, 3 epochs, effective batch size 8. For A100-40/80
 bump `--base-model unsloth/Qwen3.5-9B` and raise `--batch-size` +
 `--max-seq-length`.
 Output: `data/adapters/psyc-v1/final/` (adapter weights) + `training_meta.json`
 (base model, hyperparameters, dataset list).
 Evaluate the adapter against held-out dataset rows:
 ```bash
 docker run --gpus all --rm \
    --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/eval_adapter.py \
    --adapter /data/adapters/psyc-v2/final \
    --dataset /data/datasets/ioc_extraction-v2.jsonl --n 5
 ```
 The cockpit `/train` page lists every built dataset and trained adapter with
 its base model, hyperparameters, dataset provenance, and a per-step loss chart.
 ## Status
-Day 2 of a 48h build. Shipped: Scoutline (URLhaus) → Classifyline → Mapline
+Working platform. Built: Scoutline (URLhaus + CISA KEV + Feodo Tracker) →
-(GeoResolver via ip-api.com) → Sealine (PyNaCl sealed boxes) → Routeline →
+Classifyline → Mapline → Sealine → Routeline → Courier → Ledgerline → Trainline,
-Courier → mock CERT → Ledgerline. Cockpit has cases / case detail / ledger
+the FastAPI cockpit (five views incl. the animated Worker Mesh), and a
-pages and a design-token CSS layer. Trainline emits LoRA-ready JSONL;
+fine-tuned Qwen3.5-4B (psyc-v4) served live behind the Classifier bot.
-`Dockerfile.train` builds an unsloth + Qwen3.5 QLoRA training container.
+Not yet built: Proofline (confidence scoring), Publishline (public advisories).
 ## License
-Unset for the hackathon. Choose before any external release.
+Unset. Choose before any external release.
--- a/docs/demo.md
+++ b/docs/demo.md
@@ -0,0 +1,65 @@
 # psyc — demo run-sheet
 A ~5-minute walk-through of the platform; ~10 min including setup.
 ## 0. Setup (once)
 ```bash
 python3 -m virtualenv .venv
 .venv/bin/pip install -e .
 .venv/bin/psyc init
 ```
 ## 1. Start the services
 Separate terminals — the third is optional and needs an NVIDIA GPU:
 ```bash
 # terminal 1 — operator cockpit
 .venv/bin/psyc serve --port 8767
 # terminal 2 — stand-in CERT / abuse-API receiver
 .venv/bin/psyc mock-cert --port 8770
 # terminal 3 — live model behind the Classifier bot (optional)
 docker run --gpus all --rm -p 8771:8771 --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/serve_model.py --adapter /data/adapters/psyc-v4/final
 ```
 ## 2. Run the pipeline
 ```bash
 .venv/bin/psyc fetch-all     # ingest URLhaus + CISA KEV + Feodo Tracker
 .venv/bin/psyc demo          # one case end-to-end; prints the cockpit links
 ```
 ## 3. The walk-through
 1. **Case Queue** — http://127.0.0.1:8767/cases
   30+ cases across three feeds, with severity + TLP badges. *"Three sources,
   one normalized case object."*
 2. **Worker Mesh** — open the journey link `psyc demo` printed. This is the
   centerpiece: seven robot agents, a case token flowing through, each bot
   waking to perform its action and speak its real answer. Hit **▶ replay**.
   - **Classifier bot** carries a live verdict from the fine-tuned psyc-v4
     model — green when the model agrees with the rule, amber when it differs.
   - **Sealer** — evidence encrypted to authority public keys (PyNaCl sealed box).
   - **Router** — destinations cleared vs. policy-blocked (TLP ceiling, country).
 3. **Ledger** — http://127.0.0.1:8767/ledger
   Every submission and every blocked route, immutably recorded.
 4. **Trainline** — http://127.0.0.1:8767/train
   The four task datasets and the trained adapters with their loss curves.
 ## Talking points
 - **Defensive only** — psyc never amplifies stolen data or contacts criminal
  actors; routing is gated by TLP, jurisdiction, and incident type.
 - **Rules + model** — deterministic work is rule-based; the fine-tuned model
  handles judgment. One bot is genuinely a live model, not animation over rules.
 - **Honest about limits** — psyc-v4 evals 7/8 on severity; the one miss is a
  documented data-scarcity case (one online-botnet example), not a bug, and was
  not gamed away.
--- a/src/psyc/cli.py
+++ b/src/psyc/cli.py
@@ -8,6 +8,7 @@ import typer
 import uvicorn
 from psyc import db, log
 from psyc.cockpit import inference
 from psyc.lines import classify, courier, route, scout, seal, train
 from psyc.lines import map as map_line
 from psyc.models import Outcome
@@ -315,8 +316,15 @@ def demo() -> None:
    for b in blocked:
        typer.echo(f"      ⊘ {b.destination_name}: {b.reason}")
    typer.echo("")
-    typer.echo(f"inspect: http://127.0.0.1:8767/cases/{case.case_id}")
+    typer.echo("── see it in the cockpit ──")
-    typer.echo(f"ledger:  http://127.0.0.1:8767/ledger")
+    typer.echo(f"  Worker Mesh:  http://127.0.0.1:8767/cases/{case.case_id}/journey")
    typer.echo(f"  Case detail:  http://127.0.0.1:8767/cases/{case.case_id}")
    typer.echo(f"  Ledger:       http://127.0.0.1:8767/ledger")
    adapter = inference.server_adapter()
    if adapter:
        typer.echo(f"  Live model:   up ({adapter}) — the Classifier bot shows its verdict")
    else:
        typer.echo("  Live model:   inference server offline — Classifier bot falls back to rules")
@app.command("serve")