Go to file

m17hr1l e54242178f stage-8: deployable platform — Dockerfile + compose for company-network deploy

Lean python:3.12-slim platform image (cockpit + CLI + workers, 214 MB — no GPU,
no model). docker-compose.yml runs cockpit + mock-cert on a persistent
psyc-data volume. DATA_DIR is now overridable via PSYC_DATA_DIR so the
container's data path is explicit. docs/deploy.md covers Proxmox hosting,
first-run ingestion, and the honest caveats — no built-in auth (deploy behind
the perimeter), the GPU model server is separate, egress-proxy config.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-18 21:53:03 +02:00

docs

stage-8: deployable platform — Dockerfile + compose for company-network deploy

2026-05-18 21:53:03 +02:00

scripts

stage-6: model inference server

2026-05-18 21:05:16 +02:00

src/psyc

stage-8: deployable platform — Dockerfile + compose for company-network deploy

2026-05-18 21:53:03 +02:00

.dockerignore

stage-3c: unsloth QLoRA training scaffold for Qwen3.5

2026-05-14 14:17:14 +02:00

.gitignore

stage-2: full pipeline — Classifyline → Sealine → Routeline → Courier → Ledger + mock CERT

2026-05-14 13:44:43 +02:00

docker-compose.yml

stage-8: deployable platform — Dockerfile + compose for company-network deploy

2026-05-18 21:53:03 +02:00

Dockerfile

stage-8: deployable platform — Dockerfile + compose for company-network deploy

2026-05-18 21:53:03 +02:00

Dockerfile.train

stage-6: model inference server

2026-05-18 21:05:16 +02:00

pyproject.toml

init: scaffold psyc — defensive CTI routing & evidence-sealing platform

2026-05-14 12:43:47 +02:00

README.md

stage-7: demo polish — mesh-aware demo command, current README, run-sheet

2026-05-18 21:48:57 +02:00

README.md

psyc

Validate the signal, protect the evidence, route only what each destination is authorized to receive, and prove every external action through an immutable ledger.

Defensive cyber-threat-intelligence routing & evidence-sealing platform — a small-worker mesh that ingests public threat feeds, classifies and seals cases, routes them to the right destinations under TLP policy, and proves every action through an append-only ledger. Started as a 48h hackathon (2026-05); grown into a working platform with a fine-tuned model in operation.

Architecture

Sensors
→ Scoutline      fetch + parse public feeds, emit normalized cases   [built]
→ Proofline      validate indicators, score confidence               [planned]
→ Mapline        resolve hosting country / jurisdiction              [built]
→ Classifyline   severity, TLP, incident type, internal class        [built]
→ Sealine        authority-sealed evidence encryption                [built]
→ Routeline      pick destinations under policy, build payloads      [built]
→ Courier        submit to destinations, collect receipts            [built]
→ Ledgerline     immutable audit of every submission + blocked route [built]
→ Publishline    sanitized public intelligence after mitigation      [planned]
→ Trainline      lawful intel → LoRA datasets + QLoRA training       [built]
→ Cockpit        operator UI (FastAPI + Jinja)                       [built]

Each -line is a stage in a small-worker mesh; each worker does one narrow job and passes a normalized Case object onward. Rules drive the deterministic work; a fine-tuned model handles judgment (see Training). Humans approve anything sensitive before it leaves the platform.

Full design: docs/dossier.md · style: docs/style.md · demo run-sheet: docs/demo.md

Quick start

python3 -m virtualenv .venv
.venv/bin/pip install -e .

.venv/bin/psyc init               # create the sqlite db
.venv/bin/psyc fetch-all          # ingest URLhaus + CISA KEV + Feodo Tracker
.venv/bin/psyc demo               # run one case through the whole pipeline

The platform runs as up to three services (each in its own terminal):

.venv/bin/psyc serve --port 8767      # operator cockpit  → http://127.0.0.1:8767
.venv/bin/psyc mock-cert --port 8770  # stand-in CERT / abuse-API receiver

# optional, needs an NVIDIA GPU — puts the live model behind the Classifier bot:
docker run --gpus all --rm -p 8771:8771 --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/serve_model.py --adapter /data/adapters/psyc-v4/final

Cockpit

http://127.0.0.1:8767 — five views:

View	Path	Shows
Case Queue	`/cases`	every ingested case, severity + TLP badges
Case detail	`/cases/{id}`	classification, observables, sealed package, routes, per-case ledger
Worker Mesh	`/cases/{id}/journey`	animated 7-bot replay of the case's path; the Classifier bot shows the live model's verdict
Ledger	`/ledger`	immutable audit feed
Trainline	`/train`	datasets + trained adapters with loss charts

Code layout

src/psyc/
  models.py        normalized Case object + enums (Pydantic)
  db.py            SQLAlchemy Core — cases + ledger tables
  result.py        Ok / Err / Result[T, E]
  log.py           structlog configuration
  cli.py           flat Typer CLI
  mock_cert.py     stand-in CERT / abuse-API receiver
  lines/           one file per worker line
    scout.py       multi-source fetch + signalize (URLhaus, CISA KEV, Feodo)
    classify.py    severity / TLP / incident type / internal class
    map.py         GeoResolver — host IP → country
    seal.py        PyNaCl sealed-box evidence encryption
    route.py       destination matrix + policy gates
    courier.py     HTTP submission + payload building
    ledger.py      append-only audit
    train.py       JSONL dataset builders + quality gate
  cockpit/         FastAPI + Jinja operator UI
    app.py         routes
    journey.py     Worker Mesh / case-journey assembly
    inference.py   client for the live model server
    templates/  static/

scripts/
  train_qlora.py   unsloth QLoRA fine-tune
  eval_adapter.py  adapter evaluation
  serve_model.py   inference server (FastAPI, runs in the CUDA container)

docs/
  dossier.md  style.md  demo.md  archive/

Training & the live model (Trainline + QLoRA)

psyc train-build-all emits Alpaca-style JSONL datasets under data/datasets/<task>-v<n>.jsonl for four defensive tasks — ioc_extraction, severity_classification, routing_decision, tlp_assignment. QualityGate drops TLP:RED, restricted-source, empty, and credential-leak rows.

Fine-tune Qwen3.5-4B with QLoRA in the CUDA container:

docker build -t psyc-trainer -f Dockerfile.train .

docker run --gpus all --rm --entrypoint python \
    -v $(pwd)/data:/data -v $(pwd)/scripts:/scripts \
    psyc-trainer /scripts/train_qlora.py \
    --dataset /data/datasets/ioc_extraction-v4.jsonl \
    --dataset /data/datasets/severity_classification-v4.jsonl \
    --dataset /data/datasets/routing_decision-v4.jsonl \
    --dataset /data/datasets/tlp_assignment-v4.jsonl \
    --output /data/adapters/psyc-v4

Defaults target a 24 GB GPU (3090/4090): unsloth/Qwen3.5-4B at 4-bit, LoRA r=16, bf16, 3 epochs. Output: data/adapters/<run>/final/ + training_meta.json. Evaluate with scripts/eval_adapter.py; the /train cockpit page shows every dataset and adapter with its loss curve.

scripts/serve_model.py loads an adapter and serves /infer over HTTP. When it's running, the cockpit's Classifier bot shows the live model's severity verdict beside the rule's — and degrades to rules-only if the server is down.

Style

All code follows docs/style.md — a 12-fold guide: Optional[X] / List[X] from typing, Field(default_factory=...), Result[T, E] for expected failures, class X(str, Enum), structlog area.action events, SQLAlchemy Core (no ORM), flat hyphenated Typer commands.

Scope

Lawful, white-hat defensive operations only. psyc routes intelligence to victims, CERT/CSIRTs, sector ISACs, provider/registrar abuse desks, and trusted CTI communities. It will not amplify stolen data, expose victims prematurely, interact with criminal actors, distribute exploitation content, or submit evidence beyond a destination's max TLP. Boundaries: docs/dossier.md §5, §10, §16.

Status

Working platform. Built: Scoutline (URLhaus + CISA KEV + Feodo Tracker) → Classifyline → Mapline → Sealine → Routeline → Courier → Ledgerline → Trainline, the FastAPI cockpit (five views incl. the animated Worker Mesh), and a fine-tuned Qwen3.5-4B (psyc-v4) served live behind the Classifier bot. Not yet built: Proofline (confidence scoring), Publishline (public advisories).

License

Unset. Choose before any external release.

Languages

Python 62.6%

JavaScript 12.4%

HTML 12.1%

CSS 11.5%

Shell 1.3%