init: scaffold psyc — defensive CTI routing & evidence-sealing platform

Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence,
URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail),
flat Typer CLI, Result[T, E] type module, structlog config.
Architecture in docs/dossier.md; 12-fold style guide in docs/style.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
m17hr1l
2026-05-14 12:43:47 +02:00
commit e04c6c96d8
30 changed files with 8271 additions and 0 deletions

View File

@@ -0,0 +1,221 @@
# Review — API-Eligible Cyber Threat Reporting & Escalation Platforms (Draft v1)
**Reviewer:** Claude (Opus 4.7, 1M context)
**Review date:** 2026-05-13
**Document reviewed:** `waypoints.md` (first draft)
**Verdict:** Strong bones. Tone-perfect for white-hat defensive work — machine-to-machine, no vigilante framing. Publishable as an internal whitepaper after the critical fixes below.
---
## 1. What's Already Solid
Don't change these — they're load-bearing and correct.
- **Section 1.1 vs 1.2 split** (normal vs imminent harm) — exactly the right hinge for routing decisions.
- **Section 8 (never-submit list)** — covers GDPR / exploitation amplification / credential leakage failure modes well.
- **Section 9 normalized object** — the right abstraction. Transform-to-target instead of N bespoke pipelines.
- **Section 10 architecture sentence** — the whole project on one line: *Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public.*
---
## 2. Critical Fixes (do these before this leaves draft)
### 2.1 Geography mismatch — CISA AIS at #1 is US-only
For European-focused work, **MISP via CIRCL.lu** (Luxembourg) or the **ENISA CSIRTs Network** is the workhorse. CISA AIS does not cover EU institutions.
**Action:** Swap priorities #1#2 (MISP first, AIS second). Add a row for **CERT-EU** specifically for European institutions.
### 2.2 National CERTs are referenced generically but never named
The doc says "National CERT/CSIRT" everywhere but never resolves it to an actionable receiver.
**Action:** Add a small table after Section 1:
| Country | Receiver | Channel |
|---------|--------------------------------|----------------------------------------|
| DE | BSI / CERT-Bund | reports@cert-bund.de, MISP community |
| FR | ANSSI / CERT-FR | TAXII feed |
| UK | NCSC-UK | structured email + early-warning service |
| NL | NCSC-NL | MISP |
| ES | CCN-CERT, INCIBE-CERT | MISP |
| EU | CERT-EU, Europol EC3 | TLP-tagged MISP |
The routing engine should pick the right one based on victim country.
> **Note on Europol EC3:** they handle *criminal cases*, not first-call technical sharing. Route through your national CERT first; EC3 receives via national channels for cross-border coordination.
### 2.3 Domain registrar abuse is missing from Section 1.3
Cloudflare is covered, but registrars (Namecheap, Tucows, GoDaddy, EURid for `.eu`, DENIC for `.de`) are often the faster takedown path.
**Action:** Add to the malicious-infrastructure flow:
*registrar abuse contact from WHOIS → registrar abuse API/email → registry as escalation.*
### 2.4 Severity scale `A|B|C|D|E` is unusual and undefined
Either define it inline or replace with the standard `low|medium|high|critical` (CVSS-style) or NIS2 severity categories for EU consistency. Receivers will normalize anyway — but defining it lets the routing engine make automatic decisions.
### 2.5 Normalized object missing an `actor` block
You have `victim` but no `actor`. Add:
```json
"actor": {
"name": "Adira",
"aliases": [],
"campaign": "",
"confidence": "A1|A2|B1|B2|C2|C3|D|E|F"
}
```
This field connects the doc to the project mission and lets the routing matrix differentiate actor-specific sightings from generic abuse reports.
(`A1``F` is the Admiralty Code, the de-facto CTI standard. If that's too much, fall back to `low|medium|high`.)
### 2.6 PII at submission time is a GDPR landmine
Section 9 has `observables.emails: []`. Submitting victim email addresses to AbuseIPDB or VirusTotal is a personal-data transfer under GDPR.
**Action:** Add a pre-submission sanitizer step that:
- Hashes / redacts emails to `local-part-hash@domain` when destination is public
- Strips PII from URLs (tokens, query params containing identifiers)
- Keeps raw originals only in `evidence.raw_evidence_location` (internal-only storage)
This belongs in the doc *before* the normalized-object section, not as an afterthought.
---
## 3. High-Value Additions
### 3.1 TLP enforcement at the routing layer
Nothing in the current schema *prevents* TLP:RED data being routed to a TLP:CLEAR destination.
**Action:** Add a routing precondition: `submission.tlp <= destination.max_tlp_allowed`.
- CISA AIS rejects TLP:RED
- Cloudflare doesn't care
- Spamhaus has its own rules
- MISP communities each have their own ceiling
Encode the ceiling per destination in the routing matrix.
### 3.2 STIX 2.1 as the serialization
Right now the doc implies *internal object → bespoke transform per API*. Cheaper and more standard:
**internal object → STIX 2.1 bundle → minor adapter per destination**
MISP, OpenCTI, CISA AIS, and most CTI tools are STIX-native. One serializer beats thirteen, and you get free interop with anything that already speaks STIX.
### 3.3 Rate-limit budgets
Many of these APIs have strict limits:
- AbuseIPDB free tier: 1000 reports/day
- VirusTotal public API: 4 req/min
- Spamhaus: per-submitter quotas
- Cloudflare: per-account rate limits
Without a token-bucket per destination, high-confidence submissions get silently dropped during bursts.
**Action:** Add a `destination_quota` field to the routing matrix and an enforcement layer.
### 3.4 Feedback loop is missing
When you submit to URLhaus, you can poll for status. When you submit to MISP, you get sightings. When you submit to Cloudflare, you get a case number. These should flow back into your OpenCTI graph as evidence-of-effectiveness.
Without this, you're operating open-loop — you don't know which destinations actually act on your reports.
**Action:** Add a Section 11 "Receipt and Effectiveness Tracking" that defines:
- Per-destination receipt schema (case ID, ack timestamp, outcome status)
- Polling cadence per destination
- A success metric per destination type (takedowns confirmed, sightings count, classification adopted)
### 3.5 NoMoreRansom (NMR)
Ransomware.live is listed under monitoring, but if a decryptor research effort produces anything, NMR is the destination.
**Action:** Add to the routing matrix:
| Evidence type | First API destination | Second destination | Internal system |
|-------------------------------|--------------------------------|----------------------|------------------------|
| Ransomware decryptor evidence | NoMoreRansom (private channel) | Victim CERT chain | OpenCTI internal only |
NMR coordinates so victims can decrypt before the adversary sees the fix — *never* publish a working decryptor publicly first.
---
## 4. Nice-to-Have
### 4.1 Submitter identity & signing
- Register a stable submitter handle with MISP / MalwareBazaar / AbuseIPDB — not a personal account.
- Sign internal objects with a project PGP key before they leave the system.
- CIRCL and other major MISP communities weight trust by submitter history.
### 4.2 Audit log requirement
Every external submission writes an immutable row:
```
(timestamp, destination, payload_hash, submitter_identity, tlp, response_id, outcome)
```
Legal cover, debugging, and the feedback loop in 3.4 all need this.
### 4.3 NIS2 callout for critical-infra reporting
EU NIS2 mandates incident reporting from regulated entities within 24h of awareness. If detections involve essential/important entity sectors, the routing engine should flag NIS2 obligation regardless of receiver choice.
### 4.4 Section ordering
Sections 8 (data handling) and 9 (normalized object) are foundations, not appendices. Move them up to Sections 34. Currently a reader hits the platform list before knowing what *not* to send.
### 4.5 Confidence convention
`low|medium|high` is fine, but production CTI commonly uses the **Admiralty Code** (`A1`, `B2`, etc., describing source reliability × information credibility) or estimative language. Mention the convention even if you don't fully adopt it.
---
## 5. Implementation Notes (Blue48 Hookup)
This doc is the spec for two components in the agent stack:
1. **`report_writer` agent** outputs Section 9's normalized object as its canonical format.
2. **A routing engine** (extension of `report_writer`, or a 7th agent) consumes that object, applies the matrix in Section 6, and fans out via API adapters.
Agents stop at *"produce the normalized object."* Human review reads it, decides "yes, ship this to MISP and Cloudflare," and clicks. The routing engine then runs the API calls, captures receipts, and feeds them back to OpenCTI.
### 5.1 Suggested initial adapters (Block G priority)
1. MISP (PyMISP)
2. AbuseIPDB
3. URLhaus
4. Cloudflare Abuse Reports
5. urlscan.io
These five cover ~80% of common evidence types in the routing matrix.
### 5.2 Secrets handling
Every adapter needs API credentials. They must:
- Live in `.env` (already excluded from image via `.dockerignore`)
- Be passed at container runtime via `env_file`, never baked into the image
- Be rotatable on a schedule (the audit log in 4.2 helps prove non-overlap)
---
## 6. Summary
| Category | Count | Notes |
|------------|------:|---------------------------------------------|
| Critical | 6 | Geography, CERT mapping, registrar abuse, severity scale, actor block, PII sanitizer |
| High-value | 5 | TLP enforcement, STIX 2.1, rate limits, feedback loop, NoMoreRansom |
| Nice-to-have | 5 | Signing, audit log, NIS2, ordering, Admiralty Code |
After the critical fixes, this is a publishable internal whitepaper and a clear spec for the routing engine. Good draft.