psyc/tests at f88db2fdf71ce8860c4ca693902dea01e523576c - psyc

Files

m17hr1l f6fa52839f stage-20: defanging pipeline for IOC-extraction augmentation

Real CTI prose defangs IOCs (1[.]2[.]3[.]4, hxxp://, evil[dot]com) so they
don't auto-link in email/chat. A model trained only on canonical inputs
will fail to extract them.

New lines/defang.py: defang_ip, defang_domain, defang_url, defang_text —
four dot-styles ([.], (.), [dot], {.}) plus protocol defanging
(http→hxxp, https→hxxps). Each occurrence picks its style independently
since real advisories don't keep one style across paragraphs.

train.BuildOptions adds defang_frac (default 0.0) and seed; build()
threads options + a seeded Random through the example builders so
the augmentation is reproducible. Only _ex_ioc_extraction reads it
today — output stays canonical so the model learns messy→canonical.

CLI: train-build and train-build-all gain --defang-frac and --seed.
8 new tests including a frac=1.0 / output-canonical integration check.
The pipeline runs but is dormant at defang_frac=0.0 — psyc-v5 dataset
build will set 0.5 once OTX cases land.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-20 22:33:52 +02:00

conftest.py

stage-14: pytest test suite over the worker lines

2026-05-18 23:36:41 +02:00

test_classify.py

stage-19: ThreatFox + MalwareBazaar + OTX Scoutline sources

2026-05-20 22:14:18 +02:00

test_courier.py

stage-18: approval queue — human gate before evidence leaves

2026-05-20 21:42:08 +02:00

test_defang.py

stage-20: defanging pipeline for IOC-extraction augmentation