psyc

# psyc > Validate the signal, protect the evidence, route only what each destination is > authorized to receive, and prove every external action through an immutable ledger. Defensive cyber-threat-intelligence routing & evidence-sealing platform. Built as a 48h hackathon project on 2026-05-13. Active development. --- ## Architecture ```text Sensors → Scoutline fetch, parse, dedup, signal → Proofline validate indicators, score confidence → Mapline resolve victim, actor, jurisdiction, CERT route → Classifyline severity, TLP, incident type, internal class → Sealine authority-sealed evidence encryption → Routeline pick destinations, build payloads, submit → Ledgerline immutable audit, receipts, outcomes → Publishline sanitized public intelligence after mitigation → Trainline lawful intel → LoRA-ready training data → Cockpit operator UI (FastAPI + Jinja) ``` Each `-line` is a stage in a small-worker mesh; each worker performs one narrow job and passes a normalized `Case` object to the next stage. Heavy models are reserved for judgment-heavy tasks. Humans approve everything sensitive before it leaves the platform. Full architecture: [`docs/dossier.md`](docs/dossier.md) — consolidated read of the original individual records (still in [`docs/archive/`](docs/archive/)). --- ## Quick start ```bash python3 -m virtualenv .venv .venv/bin/pip install -e . .venv/bin/psyc init # create the sqlite db .venv/bin/psyc fetch-urlhaus --limit 50 # ingest a URLhaus pass .venv/bin/psyc serve --port 8767 # cockpit at http://127.0.0.1:8767 .venv/bin/psyc status # count of ingested cases ``` --- ## Code layout ``` src/psyc/ models.py # normalized Case object (Pydantic) db.py # SQLAlchemy Core; cases + ledger tables result.py # Ok / Err / Result[T, E] log.py # structlog configuration cli.py # flat Typer commands lines/ # one file per worker line scout.py # Fetcher + Signalizer (URLhaus today) cockpit/ # FastAPI + Jinja operator UI app.py templates/ static/ docs/ dossier.md # full architecture (consolidated) style.md # 12-fold Python style guide archive/ # original architecture docs + logo variants ``` --- ## Style All code follows [`docs/style.md`](docs/style.md): `Optional[X]` / `List[X]` from `typing`, `Field(default_factory=...)` for Pydantic mutables, `Result[T, E]` types for expected failures (`raise` reserved for true exceptions), `class X(str, Enum)` for closed string sets, structlog with `area.action` event names, SQLAlchemy Core (no ORM), flat Typer commands with hyphenated names. Ruff config in `pyproject.toml` enforces the bits a linter can check; `UP006`/`UP007`/`UP035` are disabled so the typing-import rules stand. --- ## Scope **Lawful, white-hat defensive operations only.** psyc routes intelligence to victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and trusted CTI communities. It will **not**: - amplify stolen data - expose victims prematurely - interact with criminal actors - distribute exploitation content - submit evidence that exceeds a destination's max TLP The boundaries are defined in `docs/dossier.md` §5 *Destination Minimization*, §10 *TLP Enforcement*, and §16 *Public Reporting Rules*. The Ledger records every external submission and destructive action; sensitive evidence is encrypted to authorized recipients via Sealine before any routing decision. --- ## Training (Trainline + QLoRA) `psyc train-build-all` emits Alpaca-style JSONL datasets under `data/datasets/-v.jsonl` for four defensive tasks: `ioc_extraction`, `severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate drops `TLP:RED`, restricted sources, empty/oversize, and credential-leak rows per the dossier's training-data policy. To fine-tune Qwen3.5-4B with QLoRA in an NVIDIA Docker container: ```bash # 1. build datasets (one-off; re-run after ingestion changes) .venv/bin/psyc train-build-all # 2. build the training image (pytorch 2.6/CUDA 12.4 base + unsloth + Qwen3.5) docker build -t psyc-trainer -f Dockerfile.train . # 3. fine-tune — scripts/ + data/ are mounted, so script edits need no rebuild docker run --gpus all --rm --entrypoint python \ -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \ psyc-trainer /scripts/train_qlora.py \ --dataset /data/datasets/ioc_extraction-v2.jsonl \ --dataset /data/datasets/severity_classification-v2.jsonl \ --dataset /data/datasets/routing_decision-v2.jsonl \ --dataset /data/datasets/tlp_assignment-v2.jsonl \ --output /data/adapters/psyc-v2 ``` Defaults target a 24 GB consumer GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit, LoRA `r=16`/`alpha=16`, bf16, 3 epochs, effective batch size 8. For A100-40/80 bump `--base-model unsloth/Qwen3.5-9B` and raise `--batch-size` + `--max-seq-length`. Output: `data/adapters/psyc-v1/final/` (adapter weights) + `training_meta.json` (base model, hyperparameters, dataset list). Evaluate the adapter against held-out dataset rows: ```bash docker run --gpus all --rm \ --entrypoint python \ -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \ psyc-trainer /scripts/eval_adapter.py \ --adapter /data/adapters/psyc-v2/final \ --dataset /data/datasets/ioc_extraction-v2.jsonl --n 5 ``` The cockpit `/train` page lists every built dataset and trained adapter with its base model, hyperparameters, dataset provenance, and a per-step loss chart. ## Status Day 2 of a 48h build. Shipped: Scoutline (URLhaus) → Classifyline → Mapline (GeoResolver via ip-api.com) → Sealine (PyNaCl sealed boxes) → Routeline → Courier → mock CERT → Ledgerline. Cockpit has cases / case detail / ledger pages and a design-token CSS layer. Trainline emits LoRA-ready JSONL; `Dockerfile.train` builds an unsloth + Qwen3.5 QLoRA training container. ## License Unset for the hackathon. Choose before any external release.