Dockerfile.train builds a CUDA 12.4 + unsloth container that consumes the Trainline JSONL datasets and emits a LoRA adapter at data/adapters/<run>/final. Defaults target a 24 GB GPU (Qwen3.5-4B-Instruct-bnb-4bit, r=16, bf16, 3 epochs, effective batch 8). README documents the build + run workflow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
psyc
Validate the signal, protect the evidence, route only what each destination is authorized to receive, and prove every external action through an immutable ledger.
Defensive cyber-threat-intelligence routing & evidence-sealing platform. Built as a 48h hackathon project on 2026-05-13. Active development.
Architecture
Sensors
→ Scoutline fetch, parse, dedup, signal
→ Proofline validate indicators, score confidence
→ Mapline resolve victim, actor, jurisdiction, CERT route
→ Classifyline severity, TLP, incident type, internal class
→ Sealine authority-sealed evidence encryption
→ Routeline pick destinations, build payloads, submit
→ Ledgerline immutable audit, receipts, outcomes
→ Publishline sanitized public intelligence after mitigation
→ Trainline lawful intel → LoRA-ready training data
→ Cockpit operator UI (FastAPI + Jinja)
Each -line is a stage in a small-worker mesh; each worker performs one
narrow job and passes a normalized Case object to the next stage. Heavy
models are reserved for judgment-heavy tasks. Humans approve everything
sensitive before it leaves the platform.
Full architecture: docs/dossier.md — consolidated read of
the original individual records (still in docs/archive/).
Quick start
python3 -m virtualenv .venv
.venv/bin/pip install -e .
.venv/bin/psyc init # create the sqlite db
.venv/bin/psyc fetch-urlhaus --limit 50 # ingest a URLhaus pass
.venv/bin/psyc serve --port 8767 # cockpit at http://127.0.0.1:8767
.venv/bin/psyc status # count of ingested cases
Code layout
src/psyc/
models.py # normalized Case object (Pydantic)
db.py # SQLAlchemy Core; cases + ledger tables
result.py # Ok / Err / Result[T, E]
log.py # structlog configuration
cli.py # flat Typer commands
lines/ # one file per worker line
scout.py # Fetcher + Signalizer (URLhaus today)
cockpit/ # FastAPI + Jinja operator UI
app.py
templates/
static/
docs/
dossier.md # full architecture (consolidated)
style.md # 12-fold Python style guide
archive/ # original architecture docs + logo variants
Style
All code follows docs/style.md: Optional[X] / List[X]
from typing, Field(default_factory=...) for Pydantic mutables, Result[T, E]
types for expected failures (raise reserved for true exceptions), class X(str, Enum)
for closed string sets, structlog with area.action event names, SQLAlchemy Core
(no ORM), flat Typer commands with hyphenated names. Ruff config in pyproject.toml
enforces the bits a linter can check; UP006/UP007/UP035 are disabled so the
typing-import rules stand.
Scope
Lawful, white-hat defensive operations only. psyc routes intelligence to victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and trusted CTI communities. It will not:
- amplify stolen data
- expose victims prematurely
- interact with criminal actors
- distribute exploitation content
- submit evidence that exceeds a destination's max TLP
The boundaries are defined in docs/dossier.md §5 Destination Minimization,
§10 TLP Enforcement, and §16 Public Reporting Rules. The Ledger records
every external submission and destructive action; sensitive evidence is
encrypted to authorized recipients via Sealine before any routing decision.
Training (Trainline + QLoRA)
psyc train-build-all emits Alpaca-style JSONL datasets under
data/datasets/<task>-v<n>.jsonl for four defensive tasks: ioc_extraction,
severity_classification, routing_decision, tlp_assignment. QualityGate
drops TLP:RED, restricted sources, empty/oversize, and credential-leak rows
per the dossier's training-data policy.
To fine-tune Qwen3.5-4B with QLoRA in an NVIDIA Docker container:
# 1. build datasets (one-off; re-run after ingestion changes)
.venv/bin/psyc train-build-all
# 2. build the training image (CUDA 12.4 + unsloth + Qwen3.5)
docker build -t psyc-trainer -f Dockerfile.train .
# 3. fine-tune (mount host data/ so adapters land there)
docker run --gpus all --rm \
-v $(pwd)/data:/data \
psyc-trainer \
--dataset /data/datasets/ioc_extraction-v1.jsonl \
--dataset /data/datasets/severity_classification-v1.jsonl \
--dataset /data/datasets/routing_decision-v1.jsonl \
--dataset /data/datasets/tlp_assignment-v1.jsonl \
--output /data/adapters/psyc-v1
Defaults target a 24 GB consumer GPU (3090/4090): Qwen3.5-4B-Instruct at 4-bit,
LoRA r=16/alpha=16, bf16, 3 epochs, effective batch size 8. For A100-40/80
bump --base-model unsloth/Qwen3.5-9B-Instruct-bnb-4bit and raise
--batch-size + --max-seq-length.
Output: data/adapters/psyc-v1/final/ (adapter weights) + training_meta.json
(base model, hyperparameters, dataset list).
Status
Day 2 of a 48h build. Shipped: Scoutline (URLhaus) → Classifyline → Mapline
(GeoResolver via ip-api.com) → Sealine (PyNaCl sealed boxes) → Routeline →
Courier → mock CERT → Ledgerline. Cockpit has cases / case detail / ledger
pages and a design-token CSS layer. Trainline emits LoRA-ready JSONL;
Dockerfile.train builds an unsloth + Qwen3.5 QLoRA training container.
License
Unset for the hackathon. Choose before any external release.
