stage-7: demo polish — mesh-aware demo command, current README, run-sheet

psyc demo now closes with cockpit links pointing at the Worker Mesh and
reports whether the live model server is up. README rewritten to current
state — Worker Mesh, inference server, model-in-operation, the three
services, accurate code layout. Adds docs/demo.md, a one-page run-sheet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
m17hr1l
2026-05-18 21:48:57 +02:00
parent 67f26f271e
commit f1449af45b
3 changed files with 199 additions and 114 deletions

232
README.md
View File

@@ -7,8 +7,11 @@
> Validate the signal, protect the evidence, route only what each destination is > Validate the signal, protect the evidence, route only what each destination is
> authorized to receive, and prove every external action through an immutable ledger. > authorized to receive, and prove every external action through an immutable ledger.
Defensive cyber-threat-intelligence routing & evidence-sealing platform. Defensive cyber-threat-intelligence routing & evidence-sealing platform — a
Built as a 48h hackathon project on 2026-05-13. Active development. small-worker mesh that ingests public threat feeds, classifies and seals cases,
routes them to the right destinations under TLP policy, and proves every action
through an append-only ledger. Started as a 48h hackathon (2026-05); grown into
a working platform with a fine-tuned model in operation.
--- ---
@@ -16,25 +19,25 @@ Built as a 48h hackathon project on 2026-05-13. Active development.
```text ```text
Sensors Sensors
→ Scoutline fetch, parse, dedup, signal → Scoutline fetch + parse public feeds, emit normalized cases [built]
→ Proofline validate indicators, score confidence → Proofline validate indicators, score confidence [planned]
→ Mapline resolve victim, actor, jurisdiction, CERT route → Mapline resolve hosting country / jurisdiction [built]
→ Classifyline severity, TLP, incident type, internal class → Classifyline severity, TLP, incident type, internal class [built]
→ Sealine authority-sealed evidence encryption → Sealine authority-sealed evidence encryption [built]
→ Routeline pick destinations, build payloads, submit → Routeline pick destinations under policy, build payloads [built]
Ledgerline immutable audit, receipts, outcomes Courier submit to destinations, collect receipts [built]
Publishline sanitized public intelligence after mitigation Ledgerline immutable audit of every submission + blocked route [built]
Trainline lawful intel → LoRA-ready training data Publishline sanitized public intelligence after mitigation [planned]
Cockpit operator UI (FastAPI + Jinja) Trainline lawful intel → LoRA datasets + QLoRA training [built]
→ Cockpit operator UI (FastAPI + Jinja) [built]
``` ```
Each `-line` is a stage in a small-worker mesh; each worker performs one Each `-line` is a stage in a small-worker mesh; each worker does one narrow job
narrow job and passes a normalized `Case` object to the next stage. Heavy and passes a normalized `Case` object onward. Rules drive the deterministic
models are reserved for judgment-heavy tasks. Humans approve everything work; a fine-tuned model handles judgment (see Training). Humans approve
sensitive before it leaves the platform. anything sensitive before it leaves the platform.
Full architecture: [`docs/dossier.md`](docs/dossier.md) — consolidated read of Full design: [`docs/dossier.md`](docs/dossier.md) · style: [`docs/style.md`](docs/style.md) · demo run-sheet: [`docs/demo.md`](docs/demo.md)
the original individual records (still in [`docs/archive/`](docs/archive/)).
--- ---
@@ -46,125 +49,134 @@ python3 -m virtualenv .venv
.venv/bin/psyc init # create the sqlite db .venv/bin/psyc init # create the sqlite db
.venv/bin/psyc fetch-all # ingest URLhaus + CISA KEV + Feodo Tracker .venv/bin/psyc fetch-all # ingest URLhaus + CISA KEV + Feodo Tracker
.venv/bin/psyc serve --port 8767 # cockpit at http://127.0.0.1:8767 .venv/bin/psyc demo # run one case through the whole pipeline
.venv/bin/psyc status # count of ingested cases
``` ```
The platform runs as up to three services (each in its own terminal):
```bash
.venv/bin/psyc serve --port 8767 # operator cockpit → http://127.0.0.1:8767
.venv/bin/psyc mock-cert --port 8770 # stand-in CERT / abuse-API receiver
# optional, needs an NVIDIA GPU — puts the live model behind the Classifier bot:
docker run --gpus all --rm -p 8771:8771 --entrypoint python \
-v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
psyc-trainer /scripts/serve_model.py --adapter /data/adapters/psyc-v4/final
```
---
## Cockpit
`http://127.0.0.1:8767` — five views:
| View | Path | Shows |
|---|---|---|
| Case Queue | `/cases` | every ingested case, severity + TLP badges |
| Case detail | `/cases/{id}` | classification, observables, sealed package, routes, per-case ledger |
| Worker Mesh | `/cases/{id}/journey` | animated 7-bot replay of the case's path; the Classifier bot shows the live model's verdict |
| Ledger | `/ledger` | immutable audit feed |
| Trainline | `/train` | datasets + trained adapters with loss charts |
--- ---
## Code layout ## Code layout
``` ```
src/psyc/ src/psyc/
models.py # normalized Case object (Pydantic) models.py normalized Case object + enums (Pydantic)
db.py # SQLAlchemy Core; cases + ledger tables db.py SQLAlchemy Core cases + ledger tables
result.py # Ok / Err / Result[T, E] result.py Ok / Err / Result[T, E]
log.py # structlog configuration log.py structlog configuration
cli.py # flat Typer commands cli.py flat Typer CLI
lines/ # one file per worker line mock_cert.py stand-in CERT / abuse-API receiver
scout.py # Fetcher + Signalizer (URLhaus today) lines/ one file per worker line
cockpit/ # FastAPI + Jinja operator UI scout.py multi-source fetch + signalize (URLhaus, CISA KEV, Feodo)
app.py classify.py severity / TLP / incident type / internal class
templates/ map.py GeoResolver — host IP → country
static/ seal.py PyNaCl sealed-box evidence encryption
route.py destination matrix + policy gates
courier.py HTTP submission + payload building
ledger.py append-only audit
train.py JSONL dataset builders + quality gate
cockpit/ FastAPI + Jinja operator UI
app.py routes
journey.py Worker Mesh / case-journey assembly
inference.py client for the live model server
templates/ static/
scripts/
train_qlora.py unsloth QLoRA fine-tune
eval_adapter.py adapter evaluation
serve_model.py inference server (FastAPI, runs in the CUDA container)
docs/ docs/
dossier.md # full architecture (consolidated) dossier.md style.md demo.md archive/
style.md # 12-fold Python style guide
archive/ # original architecture docs + logo variants
``` ```
--- ---
## Training & the live model (Trainline + QLoRA)
`psyc train-build-all` emits Alpaca-style JSONL datasets under
`data/datasets/<task>-v<n>.jsonl` for four defensive tasks — `ioc_extraction`,
`severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate
drops TLP:RED, restricted-source, empty, and credential-leak rows.
Fine-tune Qwen3.5-4B with QLoRA in the CUDA container:
```bash
docker build -t psyc-trainer -f Dockerfile.train .
docker run --gpus all --rm --entrypoint python \
-v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
psyc-trainer /scripts/train_qlora.py \
--dataset /data/datasets/ioc_extraction-v4.jsonl \
--dataset /data/datasets/severity_classification-v4.jsonl \
--dataset /data/datasets/routing_decision-v4.jsonl \
--dataset /data/datasets/tlp_assignment-v4.jsonl \
--output /data/adapters/psyc-v4
```
Defaults target a 24 GB GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit, LoRA
r=16, bf16, 3 epochs. Output: `data/adapters/<run>/final/` + `training_meta.json`.
Evaluate with `scripts/eval_adapter.py`; the `/train` cockpit page shows every
dataset and adapter with its loss curve.
`scripts/serve_model.py` loads an adapter and serves `/infer` over HTTP. When
it's running, the cockpit's **Classifier bot** shows the live model's severity
verdict beside the rule's — and degrades to rules-only if the server is down.
---
## Style ## Style
All code follows [`docs/style.md`](docs/style.md): `Optional[X]` / `List[X]` All code follows [`docs/style.md`](docs/style.md) — a 12-fold guide: `Optional[X]`
from `typing`, `Field(default_factory=...)` for Pydantic mutables, `Result[T, E]` / `List[X]` from `typing`, `Field(default_factory=...)`, `Result[T, E]` for
types for expected failures (`raise` reserved for true exceptions), `class X(str, Enum)` expected failures, `class X(str, Enum)`, structlog `area.action` events,
for closed string sets, structlog with `area.action` event names, SQLAlchemy Core SQLAlchemy Core (no ORM), flat hyphenated Typer commands.
(no ORM), flat Typer commands with hyphenated names. Ruff config in `pyproject.toml`
enforces the bits a linter can check; `UP006`/`UP007`/`UP035` are disabled so the
typing-import rules stand.
--- ---
## Scope ## Scope
**Lawful, white-hat defensive operations only.** psyc routes intelligence to **Lawful, white-hat defensive operations only.** psyc routes intelligence to
victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and trusted
trusted CTI communities. It will **not**: CTI communities. It will **not** amplify stolen data, expose victims
prematurely, interact with criminal actors, distribute exploitation content, or
- amplify stolen data submit evidence beyond a destination's max TLP. Boundaries: `docs/dossier.md`
- expose victims prematurely §5, §10, §16.
- interact with criminal actors
- distribute exploitation content
- submit evidence that exceeds a destination's max TLP
The boundaries are defined in `docs/dossier.md` §5 *Destination Minimization*,
§10 *TLP Enforcement*, and §16 *Public Reporting Rules*. The Ledger records
every external submission and destructive action; sensitive evidence is
encrypted to authorized recipients via Sealine before any routing decision.
--- ---
## Training (Trainline + QLoRA)
`psyc train-build-all` emits Alpaca-style JSONL datasets under
`data/datasets/<task>-v<n>.jsonl` for four defensive tasks: `ioc_extraction`,
`severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate
drops `TLP:RED`, restricted sources, empty/oversize, and credential-leak rows
per the dossier's training-data policy.
To fine-tune Qwen3.5-4B with QLoRA in an NVIDIA Docker container:
```bash
# 1. build datasets (one-off; re-run after ingestion changes)
.venv/bin/psyc train-build-all
# 2. build the training image (pytorch 2.6/CUDA 12.4 base + unsloth + Qwen3.5)
docker build -t psyc-trainer -f Dockerfile.train .
# 3. fine-tune — scripts/ + data/ are mounted, so script edits need no rebuild
docker run --gpus all --rm --entrypoint python \
-v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
psyc-trainer /scripts/train_qlora.py \
--dataset /data/datasets/ioc_extraction-v2.jsonl \
--dataset /data/datasets/severity_classification-v2.jsonl \
--dataset /data/datasets/routing_decision-v2.jsonl \
--dataset /data/datasets/tlp_assignment-v2.jsonl \
--output /data/adapters/psyc-v2
```
Defaults target a 24 GB consumer GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit,
LoRA `r=16`/`alpha=16`, bf16, 3 epochs, effective batch size 8. For A100-40/80
bump `--base-model unsloth/Qwen3.5-9B` and raise `--batch-size` +
`--max-seq-length`.
Output: `data/adapters/psyc-v1/final/` (adapter weights) + `training_meta.json`
(base model, hyperparameters, dataset list).
Evaluate the adapter against held-out dataset rows:
```bash
docker run --gpus all --rm \
--entrypoint python \
-v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
psyc-trainer /scripts/eval_adapter.py \
--adapter /data/adapters/psyc-v2/final \
--dataset /data/datasets/ioc_extraction-v2.jsonl --n 5
```
The cockpit `/train` page lists every built dataset and trained adapter with
its base model, hyperparameters, dataset provenance, and a per-step loss chart.
## Status ## Status
Day 2 of a 48h build. Shipped: Scoutline (URLhaus) → Classifyline → Mapline Working platform. Built: Scoutline (URLhaus + CISA KEV + Feodo Tracker) →
(GeoResolver via ip-api.com) → Sealine (PyNaCl sealed boxes) → Routeline → Classifyline → Mapline → Sealine → Routeline → Courier → Ledgerline → Trainline,
Courier → mock CERT → Ledgerline. Cockpit has cases / case detail / ledger the FastAPI cockpit (five views incl. the animated Worker Mesh), and a
pages and a design-token CSS layer. Trainline emits LoRA-ready JSONL; fine-tuned Qwen3.5-4B (psyc-v4) served live behind the Classifier bot.
`Dockerfile.train` builds an unsloth + Qwen3.5 QLoRA training container. Not yet built: Proofline (confidence scoring), Publishline (public advisories).
## License ## License
Unset for the hackathon. Choose before any external release. Unset. Choose before any external release.

65
docs/demo.md Normal file
View File

@@ -0,0 +1,65 @@
# psyc — demo run-sheet
A ~5-minute walk-through of the platform; ~10 min including setup.
## 0. Setup (once)
```bash
python3 -m virtualenv .venv
.venv/bin/pip install -e .
.venv/bin/psyc init
```
## 1. Start the services
Separate terminals — the third is optional and needs an NVIDIA GPU:
```bash
# terminal 1 — operator cockpit
.venv/bin/psyc serve --port 8767
# terminal 2 — stand-in CERT / abuse-API receiver
.venv/bin/psyc mock-cert --port 8770
# terminal 3 — live model behind the Classifier bot (optional)
docker run --gpus all --rm -p 8771:8771 --entrypoint python \
-v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
psyc-trainer /scripts/serve_model.py --adapter /data/adapters/psyc-v4/final
```
## 2. Run the pipeline
```bash
.venv/bin/psyc fetch-all # ingest URLhaus + CISA KEV + Feodo Tracker
.venv/bin/psyc demo # one case end-to-end; prints the cockpit links
```
## 3. The walk-through
1. **Case Queue** — http://127.0.0.1:8767/cases
30+ cases across three feeds, with severity + TLP badges. *"Three sources,
one normalized case object."*
2. **Worker Mesh** — open the journey link `psyc demo` printed. This is the
centerpiece: seven robot agents, a case token flowing through, each bot
waking to perform its action and speak its real answer. Hit **▶ replay**.
- **Classifier bot** carries a live verdict from the fine-tuned psyc-v4
model — green when the model agrees with the rule, amber when it differs.
- **Sealer** — evidence encrypted to authority public keys (PyNaCl sealed box).
- **Router** — destinations cleared vs. policy-blocked (TLP ceiling, country).
3. **Ledger** — http://127.0.0.1:8767/ledger
Every submission and every blocked route, immutably recorded.
4. **Trainline** — http://127.0.0.1:8767/train
The four task datasets and the trained adapters with their loss curves.
## Talking points
- **Defensive only** — psyc never amplifies stolen data or contacts criminal
actors; routing is gated by TLP, jurisdiction, and incident type.
- **Rules + model** — deterministic work is rule-based; the fine-tuned model
handles judgment. One bot is genuinely a live model, not animation over rules.
- **Honest about limits** — psyc-v4 evals 7/8 on severity; the one miss is a
documented data-scarcity case (one online-botnet example), not a bug, and was
not gamed away.

View File

@@ -8,6 +8,7 @@ import typer
import uvicorn import uvicorn
from psyc import db, log from psyc import db, log
from psyc.cockpit import inference
from psyc.lines import classify, courier, route, scout, seal, train from psyc.lines import classify, courier, route, scout, seal, train
from psyc.lines import map as map_line from psyc.lines import map as map_line
from psyc.models import Outcome from psyc.models import Outcome
@@ -315,8 +316,15 @@ def demo() -> None:
for b in blocked: for b in blocked:
typer.echo(f"{b.destination_name}: {b.reason}") typer.echo(f"{b.destination_name}: {b.reason}")
typer.echo("") typer.echo("")
typer.echo(f"inspect: http://127.0.0.1:8767/cases/{case.case_id}") typer.echo("── see it in the cockpit ──")
typer.echo(f"ledger: http://127.0.0.1:8767/ledger") typer.echo(f" Worker Mesh: http://127.0.0.1:8767/cases/{case.case_id}/journey")
typer.echo(f" Case detail: http://127.0.0.1:8767/cases/{case.case_id}")
typer.echo(f" Ledger: http://127.0.0.1:8767/ledger")
adapter = inference.server_adapter()
if adapter:
typer.echo(f" Live model: up ({adapter}) — the Classifier bot shows its verdict")
else:
typer.echo(" Live model: inference server offline — Classifier bot falls back to rules")
@app.command("serve") @app.command("serve")