Go to file

m17hr1l 67f26f271e stage-6: wire the Classifier bot to the live model

The Classifier bot in the Worker Mesh now shows the real fine-tuned model's
severity verdict beside the rule's. cockpit/inference.py calls serve_model.py
over HTTP; if the server is down it returns None and the bot silently falls
back to rules — the mesh never breaks. SEVERITY_INSTRUCTION + severity_features
are shared from lines/train.py so the live prompt matches what the model
trained on. The model is now genuinely in operation, not animation over rules.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-18 21:10:12 +02:00

docs

init: scaffold psyc — defensive CTI routing & evidence-sealing platform

2026-05-14 12:43:47 +02:00

scripts

stage-6: model inference server

2026-05-18 21:05:16 +02:00

src/psyc

stage-6: wire the Classifier bot to the live model

2026-05-18 21:10:12 +02:00

.dockerignore

stage-3c: unsloth QLoRA training scaffold for Qwen3.5

2026-05-14 14:17:14 +02:00

.gitignore

stage-2: full pipeline — Classifyline → Sealine → Routeline → Courier → Ledger + mock CERT

2026-05-14 13:44:43 +02:00

Dockerfile.train

stage-6: model inference server

2026-05-18 21:05:16 +02:00

pyproject.toml

init: scaffold psyc — defensive CTI routing & evidence-sealing platform

2026-05-14 12:43:47 +02:00

README.md

stage-4: multi-source Scoutline — CISA KEV + Feodo Tracker

2026-05-17 23:42:13 +02:00

README.md

psyc

Validate the signal, protect the evidence, route only what each destination is authorized to receive, and prove every external action through an immutable ledger.

Defensive cyber-threat-intelligence routing & evidence-sealing platform. Built as a 48h hackathon project on 2026-05-13. Active development.

Architecture

Sensors
→ Scoutline      fetch, parse, dedup, signal
→ Proofline      validate indicators, score confidence
→ Mapline        resolve victim, actor, jurisdiction, CERT route
→ Classifyline   severity, TLP, incident type, internal class
→ Sealine        authority-sealed evidence encryption
→ Routeline      pick destinations, build payloads, submit
→ Ledgerline     immutable audit, receipts, outcomes
→ Publishline    sanitized public intelligence after mitigation
→ Trainline      lawful intel → LoRA-ready training data
→ Cockpit        operator UI (FastAPI + Jinja)

Each -line is a stage in a small-worker mesh; each worker performs one narrow job and passes a normalized Case object to the next stage. Heavy models are reserved for judgment-heavy tasks. Humans approve everything sensitive before it leaves the platform.

Full architecture: docs/dossier.md — consolidated read of the original individual records (still in docs/archive/).

Quick start

python3 -m virtualenv .venv
.venv/bin/pip install -e .

.venv/bin/psyc init                       # create the sqlite db
.venv/bin/psyc fetch-all                  # ingest URLhaus + CISA KEV + Feodo Tracker
.venv/bin/psyc serve --port 8767          # cockpit at http://127.0.0.1:8767
.venv/bin/psyc status                     # count of ingested cases

Code layout

src/psyc/
  models.py          # normalized Case object (Pydantic)
  db.py              # SQLAlchemy Core; cases + ledger tables
  result.py          # Ok / Err / Result[T, E]
  log.py             # structlog configuration
  cli.py             # flat Typer commands
  lines/             # one file per worker line
    scout.py         # Fetcher + Signalizer (URLhaus today)
  cockpit/           # FastAPI + Jinja operator UI
    app.py
    templates/
    static/

docs/
  dossier.md         # full architecture (consolidated)
  style.md           # 12-fold Python style guide
  archive/           # original architecture docs + logo variants

Style

All code follows docs/style.md: Optional[X] / List[X] from typing, Field(default_factory=...) for Pydantic mutables, Result[T, E] types for expected failures (raise reserved for true exceptions), class X(str, Enum) for closed string sets, structlog with area.action event names, SQLAlchemy Core (no ORM), flat Typer commands with hyphenated names. Ruff config in pyproject.toml enforces the bits a linter can check; UP006/UP007/UP035 are disabled so the typing-import rules stand.

Scope

Lawful, white-hat defensive operations only. psyc routes intelligence to victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and trusted CTI communities. It will not:

amplify stolen data
expose victims prematurely
interact with criminal actors
distribute exploitation content
submit evidence that exceeds a destination's max TLP

The boundaries are defined in docs/dossier.md §5 Destination Minimization, §10 TLP Enforcement, and §16 Public Reporting Rules. The Ledger records every external submission and destructive action; sensitive evidence is encrypted to authorized recipients via Sealine before any routing decision.

Training (Trainline + QLoRA)

psyc train-build-all emits Alpaca-style JSONL datasets under data/datasets/<task>-v<n>.jsonl for four defensive tasks: ioc_extraction, severity_classification, routing_decision, tlp_assignment. QualityGate drops TLP:RED, restricted sources, empty/oversize, and credential-leak rows per the dossier's training-data policy.

To fine-tune Qwen3.5-4B with QLoRA in an NVIDIA Docker container:

# 1. build datasets (one-off; re-run after ingestion changes)
.venv/bin/psyc train-build-all

# 2. build the training image (pytorch 2.6/CUDA 12.4 base + unsloth + Qwen3.5)
docker build -t psyc-trainer -f Dockerfile.train .

# 3. fine-tune — scripts/ + data/ are mounted, so script edits need no rebuild
docker run --gpus all --rm --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/train_qlora.py \
    --dataset /data/datasets/ioc_extraction-v2.jsonl \
    --dataset /data/datasets/severity_classification-v2.jsonl \
    --dataset /data/datasets/routing_decision-v2.jsonl \
    --dataset /data/datasets/tlp_assignment-v2.jsonl \
    --output /data/adapters/psyc-v2

Defaults target a 24 GB consumer GPU (3090/4090): unsloth/Qwen3.5-4B at 4-bit, LoRA r=16/alpha=16, bf16, 3 epochs, effective batch size 8. For A100-40/80 bump --base-model unsloth/Qwen3.5-9B and raise --batch-size + --max-seq-length.

Output: data/adapters/psyc-v1/final/ (adapter weights) + training_meta.json (base model, hyperparameters, dataset list).

Evaluate the adapter against held-out dataset rows:

docker run --gpus all --rm \
    --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/eval_adapter.py \
    --adapter /data/adapters/psyc-v2/final \
    --dataset /data/datasets/ioc_extraction-v2.jsonl --n 5

The cockpit /train page lists every built dataset and trained adapter with its base model, hyperparameters, dataset provenance, and a per-step loss chart.

Status

Day 2 of a 48h build. Shipped: Scoutline (URLhaus) → Classifyline → Mapline (GeoResolver via ip-api.com) → Sealine (PyNaCl sealed boxes) → Routeline → Courier → mock CERT → Ledgerline. Cockpit has cases / case detail / ledger pages and a design-token CSS layer. Trainline emits LoRA-ready JSONL; Dockerfile.train builds an unsloth + Qwen3.5 QLoRA training container.

License

Unset for the hackathon. Choose before any external release.

Languages

Python 62.6%

JavaScript 12.4%

HTML 12.1%

CSS 11.5%

Shell 1.3%