psyc/README.md

<p align="center">
  <img src="src/psyc/cockpit/static/psyc-logo.png" alt="psyc" width="320">
</p>

# psyc

> Validate the signal, protect the evidence, route only what each destination is
> authorized to receive, and prove every external action through an immutable ledger.

Defensive cyber-threat-intelligence routing & evidence-sealing platform.
Built as a 48h hackathon project on 2026-05-13. Active development.

---

## Architecture

```text
Sensors
→ Scoutline      fetch, parse, dedup, signal
→ Proofline      validate indicators, score confidence
→ Mapline        resolve victim, actor, jurisdiction, CERT route
→ Classifyline   severity, TLP, incident type, internal class
→ Sealine        authority-sealed evidence encryption
→ Routeline      pick destinations, build payloads, submit
→ Ledgerline     immutable audit, receipts, outcomes
→ Publishline    sanitized public intelligence after mitigation
→ Trainline      lawful intel → LoRA-ready training data
→ Cockpit        operator UI (FastAPI + Jinja)
```

Each `-line` is a stage in a small-worker mesh; each worker performs one
narrow job and passes a normalized `Case` object to the next stage. Heavy
models are reserved for judgment-heavy tasks. Humans approve everything
sensitive before it leaves the platform.

Full architecture: [`docs/dossier.md`](docs/dossier.md) — consolidated read of
the original individual records (still in [`docs/archive/`](docs/archive/)).

---

## Quick start

```bash
python3 -m virtualenv .venv
.venv/bin/pip install -e .

.venv/bin/psyc init                       # create the sqlite db
.venv/bin/psyc fetch-urlhaus --limit 50   # ingest a URLhaus pass
.venv/bin/psyc serve --port 8767          # cockpit at http://127.0.0.1:8767
.venv/bin/psyc status                     # count of ingested cases
```

---

## Code layout

```
src/psyc/
  models.py          # normalized Case object (Pydantic)
  db.py              # SQLAlchemy Core; cases + ledger tables
  result.py          # Ok / Err / Result[T, E]
  log.py             # structlog configuration
  cli.py             # flat Typer commands
  lines/             # one file per worker line
    scout.py         # Fetcher + Signalizer (URLhaus today)
  cockpit/           # FastAPI + Jinja operator UI
    app.py
    templates/
    static/

docs/
  dossier.md         # full architecture (consolidated)
  style.md           # 12-fold Python style guide
  archive/           # original architecture docs + logo variants
```

---

## Style

All code follows [`docs/style.md`](docs/style.md): `Optional[X]` / `List[X]`
from `typing`, `Field(default_factory=...)` for Pydantic mutables, `Result[T, E]`
types for expected failures (`raise` reserved for true exceptions), `class X(str, Enum)`
for closed string sets, structlog with `area.action` event names, SQLAlchemy Core
(no ORM), flat Typer commands with hyphenated names. Ruff config in `pyproject.toml`
enforces the bits a linter can check; `UP006`/`UP007`/`UP035` are disabled so the
typing-import rules stand.

---

## Scope

**Lawful, white-hat defensive operations only.** psyc routes intelligence to
victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and
trusted CTI communities. It will **not**:

- amplify stolen data
- expose victims prematurely
- interact with criminal actors
- distribute exploitation content
- submit evidence that exceeds a destination's max TLP

The boundaries are defined in `docs/dossier.md` §5 *Destination Minimization*,
§10 *TLP Enforcement*, and §16 *Public Reporting Rules*. The Ledger records
every external submission and destructive action; sensitive evidence is
encrypted to authorized recipients via Sealine before any routing decision.

---

## Training (Trainline + QLoRA)

`psyc train-build-all` emits Alpaca-style JSONL datasets under
`data/datasets/<task>-v<n>.jsonl` for four defensive tasks: `ioc_extraction`,
`severity_classification`, `routing_decision`, `tlp_assignment`. QualityGate
drops `TLP:RED`, restricted sources, empty/oversize, and credential-leak rows
per the dossier's training-data policy.

To fine-tune Qwen3.5-4B with QLoRA in an NVIDIA Docker container:

```bash
# 1. build datasets (one-off; re-run after ingestion changes)
.venv/bin/psyc train-build-all

# 2. build the training image (pytorch 2.6/CUDA 12.4 base + unsloth + Qwen3.5)
docker build -t psyc-trainer -f Dockerfile.train .

# 3. fine-tune (mount host data/ so adapters land there)
docker run --gpus all --rm \
    -v $(pwd)/data:/data \
    psyc-trainer \
    --dataset /data/datasets/ioc_extraction-v1.jsonl \
    --dataset /data/datasets/severity_classification-v1.jsonl \
    --dataset /data/datasets/routing_decision-v1.jsonl \
    --dataset /data/datasets/tlp_assignment-v1.jsonl \
    --output /data/adapters/psyc-v1
```

Defaults target a 24 GB consumer GPU (3090/4090): `unsloth/Qwen3.5-4B` at 4-bit,
LoRA `r=16`/`alpha=16`, bf16, 3 epochs, effective batch size 8. For A100-40/80
bump `--base-model unsloth/Qwen3.5-9B` and raise `--batch-size` +
`--max-seq-length`.

Output: `data/adapters/psyc-v1/final/` (adapter weights) + `training_meta.json`
(base model, hyperparameters, dataset list).

Evaluate the adapter against held-out dataset rows:

```bash
docker run --gpus all --rm \
    --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/eval_adapter.py \
    --adapter /data/adapters/psyc-v1/final \
    --dataset /data/datasets/ioc_extraction-v1.jsonl --n 5
```

## Status

Day 2 of a 48h build. Shipped: Scoutline (URLhaus) → Classifyline → Mapline
(GeoResolver via ip-api.com) → Sealine (PyNaCl sealed boxes) → Routeline →
Courier → mock CERT → Ledgerline. Cockpit has cases / case detail / ledger
pages and a design-token CSS layer. Trainline emits LoRA-ready JSONL;
`Dockerfile.train` builds an unsloth + Qwen3.5 QLoRA training container.

## License

Unset for the hackathon. Choose before any external release.