init: scaffold psyc — defensive CTI routing & evidence-sealing platform

Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 12:43:47 +02:00
commit e04c6c96d8
30 changed files with 8271 additions and 0 deletions
--- a/docs/archive/waypoints_scalpel.md
+++ b/docs/archive/waypoints_scalpel.md
@@ -0,0 +1,707 @@
+# Detailed Review v2 — API-Eligible Cyber Threat Reporting & Escalation Platforms
+
+**Reviewer:** Claude (Opus 4.7, 1M context)
+**Review date:** 2026-05-13
+**Document reviewed:** `waypoints.md` (first draft)
+**Companion to:** `waypoints_firstpass.md` (v1 executive summary)
+**Scope of this v2:** section-by-section findings, cross-cutting gaps, missing categories, revised schema, implementation priorities for blue48.
+
+---
+
+## 0. Method
+
+I re-read the draft three times against the following lenses:
+
+1. **Factual / API accuracy** — does each platform actually do what's claimed?
+2. **Operational correctness** — would the routing actually work in practice, or break on first contact with reality?
+3. **Legal / compliance** — GDPR, NIS2, MLAT, jurisdiction, chain of custody
+4. **Threat-model coverage** — does this serve the actual project goal (campaign disruption, not individual attribution)?
+5. **OPSEC of the reporter** — what does the adversary learn from each submission?
+
+Findings below carry confidence tags: **[verified]**, **[likely current]**, **[verify before relying on]**.
+
+---
+
+## 1. Section-by-Section Findings
+
+### 1.1 — Section 1: Recommended Reporting Order
+
+**1.1.1 In Scenario 1.1 (normal credible threat), going to the victim first is correct in 90% of cases — but flag the exception.**
+
+Insider-attack scenarios reverse this: notifying a victim org whose own admin/employee is the threat actor warns the attacker. For credential-leak cases involving privileged accounts, route CERT-first and let CERT decide whether to notify the victim org's leadership or its security contact. Add a `1.1.bis` for "victim contact may itself be compromised."
+
+**1.1.2 Scenario 1.2 (imminent harm) is missing a specific decision point.**
+
+If the imminent harm is to *critical infrastructure* (energy, water, healthcare, finance), in EU jurisdictions the **NIS2 Directive** mandates 24-hour reporting from regulated entities. Your routing engine should detect "victim sector ∈ NIS2 essential/important entity list" and either:
+- Route the report so the victim can fulfill their NIS2 obligation, OR
+- (If victim is unreachable) report directly via the relevant national CERT's NIS2 channel, which exists separately from generic CSIRT contact paths
+
+**1.1.3 Scenario 1.3 missing receiver categories:**
+
+- **Hosting providers** (not just CDNs). Cloudflare is a CDN; the actual origin server is somewhere else (Hetzner, OVH, AWS, DigitalOcean, etc.). A Cloudflare-only report leaves the origin running. Add hosting provider abuse as a parallel step, not after CDN.
+- **Domain registrars** via WHOIS-extracted abuse contact, plus registry escalation for ccTLDs (DENIC for `.de`, AFNIC for `.fr`, EURid for `.eu`, Nominet for `.uk`)
+- **Certificate authorities** for compromised cert revocation (Let's Encrypt revoke API for ACME-issued certs; commercial CA abuse contacts for the rest)
+- **DNS providers** independent of registrar (Cloudflare DNS, Quad9, Google Public DNS abuse contacts — for blocking, not takedown)
+
+**1.1.4 The implicit ordering bias.**
+
+The draft optimizes for *legal-defensibility* (talk to the receiver who can act) but doesn't optimize for *operational speed-to-mitigation*. For phishing kits with active credential harvesting, the fastest mitigation is often: parallel-fan-out to (CDN, hosting, registrar, browser-block-list providers) simultaneously, then notify CERT as record-keeping. The doc reads as serial when in practice it should be parallel.
+
+---
+
+### 1.2 — Section 2: Tier-1 API Reporting Platforms
+
+**1.2.1 Missing platforms that belong in Tier 1:**
+
+| Platform | Why Tier-1 | API style |
+|---|---|---|
+| **abuse.ch ThreatFox** | IOC graph, sibling to URLhaus/MalwareBazaar, accepts indicator submissions with kill-chain context | REST + Auth-Key |
+| **abuse.ch YARAify** | YARA rule sharing + scanning. Direct fit since `detection_author` emits YARA | REST + Auth-Key |
+| **AlienVault OTX (now LevelBlue Labs OTX)** | One of the largest free CTI communities. Pulses for sharing, pull API for consumption. **Major omission from current draft.** | REST + DirectConnect API |
+| **CIRCL Hashlookup** | Fast hash reputation lookup, free, EU-hosted | REST |
+| **Shadowserver** | Free network exposure / vulnerability scanning reports. Subscribe by ASN/CIDR/contact. **The draft has it under "monitoring" but Shadowserver also accepts submissions and runs important takedown campaigns.** | REST API |
+
+**1.2.2 Reorder by jurisdictional fit:**
+
+The current #1 (CISA AIS) is US-government-tied. For Europe-focused work the right Tier-1 priorities are roughly:
+
+1. MISP (CIRCL communities, plus ENISA CSIRTs Network communities)
+2. OpenCTI (your own knowledge graph)
+3. AlienVault OTX (broad reach, low friction)
+4. CISA AIS (only if US-victim cases or US-relevant indicators)
+5. Cloudflare / hosting abuse APIs
+6. Spamhaus
+7. URLhaus / MalwareBazaar / ThreatFox
+8. AbuseIPDB
+9. urlscan.io
+10. Netcraft
+
+**1.2.3 Per-row corrections in the existing table:**
+
+- **CISA AIS — "STIX/TAXII bidirectional"** — be specific: STIX 2.1 over TAXII 2.1, with the **AIS Profile** (a restricted subset of STIX). Submitting non-AIS-Profile STIX gets rejected. **[verified]**
+- **Cloudflare Abuse Reports API** — also requires noting that high-volume submitters can apply to be a **Trusted Reporter** which gets faster SLAs. **[likely current]**
+- **VirusTotal API** — public submissions are visible to all VT Premium customers (incl. potentially the adversary). **The draft doesn't flag this — it's a critical OPSEC point.** Use **VT Private Scanning** for sensitive samples. **[verified]**
+- **PhishTank** — community-vetted. As of late 2024 / early 2025 there were reports of reduced moderation activity. **[verify before relying on]**. Netcraft is the more reliable phishing-takedown channel today.
+- **Google Web Risk** — access truly is gated by Google customer engineering review; not a 5-minute API key signup. Apply early. **[verified]**
+
+---
+
+### 1.3 — Section 3: Per-Platform Notes
+
+**3.1 CISA AIS:** Add: requires sponsorship from a federal agency or a signed AIS Sharing Agreement, plus the connector software (typically TAXII client). Onboarding measured in weeks, not days. The draft makes it sound like a sign-up form.
+
+**3.2 MISP:** Missing:
+- ZeroMQ for real-time push (worth using if you want sub-second propagation to your own consumers)
+- Distinction between **events** (point-in-time intelligence) and **feeds** (continuous streams; better for IOC bulk delivery)
+- "Create a community" vs "Join a community" tradeoff — joining CIRCL's communities is the lowest-friction entry; creating your own is high-effort and pointless until you have multiple sharing partners
+- TLP-marking enforcement is **not** automatic at the MISP level — your client must respect TLP before publishing onward
+
+**3.3 OpenCTI:** Missing:
+- The connector framework: ~80+ pre-built connectors (MITRE ATT&CK, MISP, CrowdStrike, Recorded Future, etc.) — most of your enrichment needs are already solved
+- The Workbench feature for analyst review before publishing
+- Filigran (the company behind OpenCTI) hosts a managed cloud version if you don't want to operate it yourself
+
+**3.4 Cloudflare Abuse Reports API:** Missing:
+- API token requires `Account.Abuse Reports` permission — won't work with read-only tokens
+- Rate limits documented separately from the abuse API itself
+- For Cloudflare-hosted **Workers** (their serverless), abuse reports go to a different channel
+- Trusted Reporter program (mentioned above) — apply once you have submission history
+
+**3.5 Spamhaus:** Missing the lists distinction:
+- DBL = Domain Block List (domains)
+- SBL = Spamhaus Block List (IPs)
+- XBL = Exploits Block List (exploit-sourced IPs)
+- ZRD = Zero Reputation Domains (newly registered)
+- Each list has different submission criteria. Wrong-list submissions get rejected. Your routing engine needs a list-selector.
+
+**3.6 AbuseIPDB:** Missing:
+- The 23-category taxonomy (`SSH brute force`, `port scan`, `web app attack`, `phishing`, etc.) — **your evidence type must map to an AbuseIPDB category code** or the submission is low-utility
+- Free tier: 1000 reports/day, 100 IP checks/min. Paid tiers scale
+- Single-reporter submissions have low weight; reputation requires multiple corroborating submitters. Send to AbuseIPDB *after* sending to other corroborators
+
+**3.7 URLhaus:** Missing:
+- Submission auth-key required (free, sign up)
+- Manual review for high-confidence flags
+- 2024+ stricter format requirements
+- Linkage to MalwareBazaar — submit the URL to URLhaus, the sample to MalwareBazaar, link by hash
+
+**3.8 MalwareBazaar:** Missing:
+- File size limits (~250MB last I checked)
+- Office macro / Windows installer formats need specific tags
+- Tag taxonomy is community-driven; non-canonical tags reduce utility
+- The "Avoid" line about legal-share is correct but vague. Specifically: do not upload samples obtained under NDA, samples from incidents where the victim hasn't consented, or samples that may contain victim PII (e.g., crafted payloads with the victim's name)
+
+**3.9 PhishTank:** As noted above, declining. Verify status; consider deprioritizing.
+
+**3.10 urlscan.io:** Missing:
+- Visibility settings: `public`, `unlisted`, `private` (private = paid)
+- Public scans are searchable by everyone — including the adversary monitoring for their kits being analyzed
+- The **Search API** is invaluable for retrohunts: "show me every scan in the last 30 days that loaded resource X"
+- Bulk submission via UUID-tagged `customagent` field for tracking your submission cohort
+
+**3.11 Google Web Risk:** Missing:
+- GCP project + Web Risk API enabled prerequisite
+- Submissions evaluated by Google Safe Browsing pipeline; latency hours-to-days
+- Successful submissions show up in Chrome / Firefox / Safari Safe Browsing warnings — **massive amplification**. Use only for high-confidence URLs
+
+**3.12 VirusTotal:** Missing:
+- Public API: 4 lookups/min, 500/day, 15.5k/month
+- Premium API: rate limits negotiated
+- File submission privacy: anyone with VT Intelligence can see your sample. **Critical OPSEC point not in draft.**
+- VT Private Scanning for sensitive samples
+- VT Hunting (YARA livehunt) for ongoing detection
+
+**3.13 Netcraft:** Missing:
+- Strong takedown-execution record — Netcraft actually does the takedown work, not just reporting
+- Free tier exists for low-volume reporters
+- Strongest at brand-protection / phishing
+- They prefer evidence package format: source URL + screenshot + redirect chain + landing page HTML
+
+---
+
+### 1.4 — Section 4: Internal Case / Incident Routing Platforms
+
+**1.4.1 Missing platforms:**
+
+| Platform | Best for | Why missing matters |
+|---|---|---|
+| **Wazuh** | Open-source SIEM with TheHive integration | Many SOCs use it; integrates cleanly with this stack |
+| **Microsoft Sentinel** | Cloud SIEM with Logic Apps automation | Major enterprise platform — leaving it out makes the doc feel non-enterprise |
+| **Splunk SOAR (formerly Phantom)** | Commercial SOAR | Major in enterprise SOCs |
+| **Cortex XSOAR** | Commercial SOAR (Palo Alto) | Same |
+| **Shuffle** | Open-source SOAR | Free alternative to XSOAR/Phantom |
+| **Tracecat** | Newer open-source SOAR | Younger but actively developed |
+| **n8n** | General workflow automation | Not security-specific but widely used as a glue layer |
+
+**1.4.2 TheHive 5 vs 4:** Be explicit — TheHive 4 reached EOL, TheHive 5 is current. Code examples should target TheHive 5 API.
+
+---
+
+### 1.5 — Section 5: Monitoring (Not Primary Reporting)
+
+**1.5.1 Missing high-value monitoring sources:**
+
+| Source | What it gives you | API |
+|---|---|---|
+| **AlienVault OTX** | Largest free pulse community, IOC subscriptions | REST DirectConnect |
+| **CIRCL Passive DNS / Passive SSL** | Historical DNS / cert lookups; EU-hosted | REST |
+| **PhishStats** | Phishing URL stream | REST + RSS |
+| **DNSDumpster / SecurityTrails / BinaryEdge** | Recon/asset-discovery DBs | REST (mostly paid for bulk) |
+| **GreyNoise** | Benign-scanner classification — **reduces false positives** in IP reporting by tagging known internet-noise sources | REST |
+| **Spamhaus DNSBL queries** | Free DNSBL lookups | DNS protocol |
+| **Maltrail** | Open-source malicious-traffic detection feeds | Static feed download |
+| **CT log monitors (crt.sh, Censys CT)** | New-cert issuance for your monitored domains — catches phishing-domain registrations | REST |
+
+**1.5.2 GreyNoise specifically deserves a callout.**
+
+Reporting an IP that GreyNoise classifies as `benign-scanner` (Shodan, Censys, security researchers) gets you blacklisted from AbuseIPDB and embarrasses you with CERTs. **Always GreyNoise-check before submitting an IP report.** This is a one-line API call that prevents a class of bad submissions.
+
+**1.5.3 Shadowserver placement.**
+
+Currently in Section 5 (monitoring only) but Shadowserver also runs *active sinkholing and takedown campaigns* with global reach. They accept tip-offs and IOC contributions. Move them up to Tier 1 receivers, or at least call out the bidirectional relationship.
+
+---
+
+### 1.6 — Section 6: Practical Routing Matrix
+
+**1.6.1 Missing rows:**
+
+| Evidence type | First | Second | Internal |
+|---|---|---|---|
+| Compromised TLS certificate | CT log monitor sighting → CA revocation request | Cloudflare/host if cert is in use | OpenCTI / TheHive |
+| Mobile app malware | Google Play / Apple App Review submission | VirusTotal sample upload | OpenCTI |
+| Cryptocurrency wallet (laundering) | Chainalysis / TRM (commercial) or on-chain analysis | OFAC SDN if sanctioned | Internal restricted case |
+| Open-source supply-chain attack | Registry security (security@npmjs.com, security@python.org) | GitHub Security Lab | OpenCTI / TheHive |
+| Compromised GitHub repo / leaked secret | GitHub Security Advisory + vendor-specific revoke API (e.g., AWS IAM) | Victim org | Internal restricted |
+| Tor hidden service hosting malware | Document only (no takedown for .onion); push IOCs to MISP | n/a | OpenCTI |
+| Sanctions-evasion crypto | OFAC SDN reporting (US) / EU FSF reporting | National FIU | Internal restricted |
+| CSAM (legally separate) | NCMEC CyberTipline (US) / IWF (UK) / INHOPE (international) | National police | Stop processing immediately, preserve under legal hold |
+| Phishing-resistant kit / 2FA bypass | Browser vendor reports (Chrome / Firefox / Safari Trust & Safety) | Affected service | OpenCTI |
+
+**1.6.2 Cloudflare-proxied abuse needs a follow-up step.**
+
+Current row says: First → Cloudflare API; Second → Netcraft / PhishTank. Missing: **Third → origin host abuse contact** (extracted by sending Cloudflare a HEAD request that bypasses cache, or via certificate transparency cross-reference). Without this, takedown leaves the origin alive and the attacker just provisions a new CDN front-end.
+
+**1.6.3 The "Leaked credentials/API keys" row is dangerously thin.**
+
+"Victim first → CERT if severe → Internal IR case" — missing the **revocation step**, which is more time-critical than reporting. If you find a leaked AWS access key, the **first** action is `aws iam delete-access-key` via the affected account (with permission) or trigger AWS's automatic key-revocation by submitting to GitHub Secret Scanning. If you find leaked OAuth tokens for GitHub/Slack/etc., the relevant vendor has an automated revocation pathway. **Add the revocation step before victim notification.**
+
+---
+
+### 1.7 — Section 7: Minimum Viable API Stack
+
+The current MVP list (10 items) is too heavy for "minimum viable." A genuine MVP for a new white-hat group is closer to:
+
+1. **OpenCTI** — your knowledge graph (or, if too heavy, just MISP for both)
+2. **MISP via CIRCL community** — free, EU-hosted, broad reach
+3. **AlienVault OTX** — free, broadest reach for indicator sharing
+4. **AbuseIPDB** — free tier, easy
+5. **URLhaus + MalwareBazaar + ThreatFox** (the abuse.ch trio — same auth-key, three destinations)
+6. **urlscan.io** — free tier, evidence generation
+7. **National CERT direct email + GPG** — non-API, but mandatory
+
+That's 7 things, of which 5 are pure free signups. Tackle Cloudflare/Netcraft/Spamhaus/GoogleWebRisk after you have throughput in those 7.
+
+The current MVP includes TheHive — that's case management, not external reporting. Move it out of "API stack" since it's internal infrastructure.
+
+---
+
+### 1.8 — Section 8: Data Handling Rules
+
+**1.8.1 "Never submit publicly" — additions:**
+
+- Insider-threat allegations without verification
+- Attribution claims about specific named individuals (the hard line we settled on earlier)
+- Government / classified material
+- PHI (US HIPAA scope)
+- PCI scope financial data
+- Children's data (COPPA US; GDPR Article 8 EU)
+- Biometric data
+- Trade secrets / source code
+- Material from unauthorized intrusion (even if you got to it via OSINT, "I downloaded their leaked DB" makes you a recipient of stolen goods in some jurisdictions)
+
+**1.8.2 "Safe to submit" — additions:**
+
+- YARA rules (especially to YARAify)
+- Sigma rules (to SigmaHQ via PR)
+- Mutex names, named-pipe signatures (good Sysmon detections)
+- Persistence registry keys
+- Scheduled task names
+- TLS fingerprints (JA3, JA4)
+- HTTP user-agent strings observed in C2
+- ASN block ranges associated with adversary infrastructure
+- STIX/TAXII patterns
+- ATT&CK technique IDs (always)
+
+**1.8.3 Missing entire section: "Sanitize before submitting"**
+
+- Strip URL query parameters that may contain victim tokens / session IDs
+- Hash email local-parts when target destination is public (`a72b91…@example.com`)
+- Redact internal hostnames from samples
+- Strip x-forwarded-for / source IP from log excerpts that name your honeypot
+- Replace victim-org names with role descriptors (`<european_bank>`) unless the submission is to a destination where the victim has consented or the receiver is trusted (CERT)
+
+---
+
+### 1.9 — Section 9: Recommended Submission Object
+
+**1.9.1 Schema gaps (additions in bold):**
+
+```json
+{
+  "case_id": "WG-2026-000001",
+  "schema_version": "1.0",
+  "tlp": "AMBER",                            // use TLP 2.0 values: CLEAR/GREEN/AMBER/AMBER+STRICT/RED
+  "tlp_marking_definition_ref": "marking-definition--...",  // STIX-compatible
+  "severity": "low|medium|high|critical",   // replace A-E with standard
+  "confidence": "low|medium|high",          // or Admiralty A1-F6
+  "language": "en",                         // i18n
+  "first_observed": "2026-05-13T10:00:00Z", // top-level
+  "last_observed":  "2026-05-13T11:30:00Z",
+  "valid_from":     "2026-05-13T10:00:00Z", // STIX-style validity window
+  "valid_until":    "2026-08-13T10:00:00Z",
+  "threat_type": "phishing|malware|ransomware|credential_exposure|iab|botnet|vulnerability_exploitation",
+
+  "victim": {
+    "organization": "",
+    "domain": "",
+    "country": "",
+    "sector": "",
+    "nis2_category": "essential|important|n/a",   // for EU NIS2 routing
+    "consent_to_name_publicly": false             // sanitization gate
+  },
+
+  "actor": {
+    "name": "Adira",
+    "aliases": [],
+    "campaign": "",
+    "confidence": "A1|A2|...|F6"
+  },
+
+  "kill_chain": ["recon|weapon|deliver|exploit|install|c2|action"],
+  "attack_techniques": ["T1566.001", "T1059.003"],
+
+  "source": {
+    "category": "forum|leak_site|telegram|honeypot|sensor|osint|tip",
+    "first_seen": "",
+    "last_seen": "",
+    "collection_method": "lawful_osint_or_partner_feed",
+    "burn_sensitivity": "low|medium|high"        // affects sanitization aggressiveness
+  },
+
+  "observables": {
+    "ips": [],
+    "domains": [],
+    "urls": [],
+    "hashes": [],
+    "emails": [],
+    "wallets": [],
+    "cves": [],
+    "yara_rules": [],
+    "sigma_rules": [],
+    "mutexes": [],
+    "named_pipes": [],
+    "scheduled_tasks": [],
+    "registry_keys": [],
+    "user_agents": [],
+    "tls_fingerprints": [],                     // JA3/JA4
+    "certificates": [],                         // CT log entries / SHA256 of cert
+    "asn_blocks": [],
+    "process_names": []
+  },
+
+  "pattern_relationships": [
+    {"source": "domain:example.com", "type": "resolves_to", "target": "ipv4:1.2.3.4", "first_seen": "..."}
+  ],
+
+  "evidence": {
+    "summary": "",
+    "sanitized_screenshots": [],
+    "raw_evidence_location": "internal_restricted_storage",
+    "detonation_results": [],                   // sandbox report references
+    "memory_artifacts": []                      // forensic, internal only
+  },
+
+  "timeline": [
+    {"ts": "...", "event": "..."}
+  ],
+
+  "indicators_of_compromise": [],               // observables flagged as actively malicious
+
+  "recommended_actions": [],
+
+  "routing": {
+    "primary_destinations": [],
+    "secondary_destinations": [],
+    "public_disclosure_allowed": false,
+    "embargo_until": null,                      // timed disclosure
+    "coordinated_with": []                      // who else has been told (CERT case IDs etc)
+  },
+
+  "audit": {
+    "submitted_to": [],                         // append-only history of submissions
+    "feedback_received": [],                    // ack IDs, takedown confirmations
+    "submitter_identity": "wg-handle@misp",     // which submitter handle was used
+    "signed_with": "PGP fingerprint",
+    "object_sha256": ""                         // tamper-detect on the object itself
+  }
+}
+```
+
+**1.9.2 Other schema concerns:**
+
+- `case_id` format `WG-2026-000001` is fine, but reserve a 2-char org prefix to avoid collision if you ever federate with another working group
+- `tlp` should use **TLP 2.0** spec values (`CLEAR`, `GREEN`, `AMBER`, `AMBER+STRICT`, `RED`) — TLP 1.0 used different terms
+- Severity / confidence mismatch in v1: severity used `A-E`, confidence used words. Standardize.
+- Add a per-object hash so the routing engine can detect tampering between produce-time and submit-time
+
+---
+
+### 1.10 — Section 10: Final Recommendation
+
+**1.10.1 The architecture sentence is missing the feedback edge.**
+
+Current: *Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting*
+
+Better: *Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting → **receipts and outcomes back to OpenCTI** → effectiveness scoring → re-prioritization*
+
+Without the feedback edge, you can't tell which destinations are worth maintaining.
+
+**1.10.2 Missing entirely: closing checklist for "we're ready to submit."**
+
+A final checklist before any external submission fires:
+
+```
+[ ] TLP enforced (object.tlp <= destination.max_tlp)
+[ ] Sanitization pass complete (PII stripped per destination policy)
+[ ] GreyNoise check (if observables include IPs)
+[ ] Quota available (rate-limit budget not exceeded)
+[ ] Submitter identity registered with destination
+[ ] Object signed
+[ ] Audit row written
+[ ] Human approver clicked yes (for non-automated tier)
+```
+
+This belongs as Section 11 or as the closing block of Section 10.
+
+---
+
+## 2. Cross-Cutting Gaps (Not Tied to Any Section)
+
+### 2.1 — OPSEC for the Reporters Themselves
+
+**Not in the doc at all.** If your group is reporting Adira to authorities, Adira may notice — they read MISP communities (those that are open), they read URLhaus (public), and they have visibility into VirusTotal Premium (paid customer).
+
+Required additions:
+
+- **Submission identity registry**: which handle is used on which platform, who has access, rotation schedule
+- **Account-creation OPSEC**: don't use personal accounts on submission platforms; create a project handle, use a project email, register with project-owned phone/2FA
+- **Network OPSEC for collection**: if you're scraping leak sites or monitoring the adversary's infrastructure, route through a VPN or research-purpose proxy — never the same network as your submission identity
+- **PGP for CERT comms**: every national CERT publishes a PGP key. Every email submission to a CERT should be signed and encrypted. Untouched in the draft.
+
+### 2.2 — Burnt Source Protection
+
+If you have a private collection source (honeypot, infiltrated channel, tipped-off insider), publishing IOCs from it can burn the source. Specifically:
+
+- A unique honeypot fingerprint (banner, response timing, listening port) lets the adversary identify which sample came from your honeypot
+- Publishing a sample with a unique build artifact (your sandbox's hostname in a DNS query, a timestamp matching your detonation window) reveals your detonation infrastructure
+- Reporting a forum URL while it's still live tips off the forum operator that it's being watched
+
+The doc needs a **burn-sensitivity tier** on each observable, and a sanitization step that aggressively scrubs source-identifying artifacts before any external submission.
+
+### 2.3 — Adversary Observability of Your Submissions
+
+Tier each receiver by who can see your submission:
+
+| Receiver | Adversary visibility |
+|---|---|
+| MISP private community | trusted community only |
+| MISP public community / OTX public pulse | anyone with an account |
+| URLhaus | public — adversary can monitor |
+| MalwareBazaar | public — adversary can detect their sample was uploaded |
+| VirusTotal public submission | every VT Premium customer (incl. potentially adversary) |
+| VT Private Scanning | only your team (paid) |
+| AbuseIPDB | public reputation visible |
+| Cloudflare Abuse Reports | only Cloudflare and the reported asset owner |
+| CERT direct (GPG-encrypted) | only the CERT |
+
+The routing engine should display this visibility for each destination during human review.
+
+### 2.4 — Chain of Custody / Legal Admissibility
+
+If any of this material may end up in a criminal proceeding, chain of custody matters. Specifically:
+
+- The **raw evidence** must be preserved unmodified, with hashes recorded at acquisition time
+- Any **transformation** (sanitization, normalization) must be reversible — the routing engine logs the input hash, the transform applied, and the output hash
+- The **submitter identity** for each external submission is logged
+- Witnesses (multi-party access logs) are preferred for high-value evidence
+
+The current `evidence.raw_evidence_location` field is a placeholder; it needs structure: storage path, hash, acquisition timestamp, acquirer identity.
+
+### 2.5 — Amplification Risk
+
+Publishing IOCs publicly amplifies awareness — which is good for defenders but bad if:
+
+- The IOC includes a compromised legitimate site (you damage the site owner's reputation)
+- The IOC is for a piece of infrastructure that's about to be used in a sting operation by LE
+- The IOC reveals an investigation technique still under embargo
+
+A **publish-readiness review** belongs in Section 1 of the doc, not in the closing checklist.
+
+### 2.6 — Failure Modes / Retries
+
+What happens when:
+
+- URLhaus rejects a submission (malformed, low-confidence flag, duplicate)?
+- MISP is down for maintenance?
+- Cloudflare returns 503?
+- Your submitter identity gets rate-limited?
+- An API token is revoked mid-batch?
+
+The doc has no resilience layer. Recommend:
+
+- Idempotent submission with client-generated IDs (so retries don't double-submit)
+- Per-destination retry policy (exponential backoff with jitter)
+- Dead-letter queue for permanent failures — surface in human-review UI
+- Per-submitter quota tracking, with auto-failover to backup submitter if available
+
+### 2.7 — Versioning and Maintenance
+
+The doc has no version number, no changelog, no maintainer field, no review cadence. For a living spec like this:
+
+```
+---
+schema_version: 1.0
+last_reviewed: 2026-05-13
+next_review_due: 2026-08-13
+maintainer: <project lead>
+changelog:
+  - 2026-05-13: initial draft
+---
+```
+
+API surfaces of these platforms change (Cloudflare deprecations, VT pricing changes, abuse.ch tag taxonomy updates). A quarterly re-validation cadence is sane.
+
+### 2.8 — Multi-Language Submissions
+
+Many national CERTs prefer or require local language for narrative fields (BSI German, ANSSI French, CCN-CERT Spanish). The submission object's `language` field (added above) plus a translation step in the routing engine handles this. Currently absent.
+
+---
+
+## 3. Missing Categories Entirely
+
+### 3.1 — Hosting Provider Abuse Channels (most lack true REST APIs)
+
+| Provider | Channel | API? |
+|---|---|---|
+| AWS | abuse@amazonaws.com + form | No public REST; AWS responds to email |
+| Google Cloud | https://support.google.com/cloud/answer/2417620 | Form-only |
+| Azure | https://msrc.microsoft.com/report/abuse | Form + email |
+| DigitalOcean | abuse@digitalocean.com | Email + status REST |
+| Hetzner | abuse@hetzner.com + form | Form |
+| OVH | abuse@ovh.net + Anti-abuse REST API | Yes |
+| Linode (Akamai) | abuse@linode.com | Email |
+| Vultr | abuse@vultr.com | Email |
+
+Treat email-based providers as a different submission class (template + GPG-signed email, with parsed-receipt detection). Worth a Section 11 in the doc.
+
+### 3.2 — Cryptocurrency / Sanctions
+
+- **Chainalysis Reactor** — commercial, gold standard for on-chain investigations
+- **TRM Labs** — commercial alternative
+- **CipherTrace** (Mastercard) — commercial
+- **OFAC SDN** reporting — for US-sanctioned wallets
+- **EU Financial Sanctions Files (FSF)** — for EU sanctions
+- **National FIUs** — Financial Intelligence Units, country-specific
+- Free / open: **GraphSense** (open-source on-chain analytics), **Etherscan** (manual)
+
+### 3.3 — Mobile / App Store
+
+- **Google Play Protect submissions** (for Android malware)
+- **Apple App Review report** (for malicious iOS apps)
+- **APKMirror reports** (for repackaged apps)
+- **F-Droid security contacts** (for compromised FOSS apps)
+
+### 3.4 — Open-Source Supply Chain
+
+- **PyPI:** security@python.org
+- **npm:** security@npmjs.com + vendored auto-revoke for leaked tokens
+- **crates.io:** help@crates.io
+- **RubyGems:** security@rubygems.org
+- **Maven Central:** central@sonatype.org
+- **GitHub Security Lab** (research collaboration)
+- **OpenSSF** Vulnerability Disclosure (cross-ecosystem coordination)
+- **Sigstore** (provenance verification, longer-term)
+
+### 3.5 — Certificate Authorities
+
+- **Let's Encrypt:** ACME revocation API for ACME-issued certs
+- **Sectigo / DigiCert / GlobalSign / Entrust:** abuse contacts in CA/Browser Forum compliance docs
+- **CT log monitors** for detection (crt.sh, Censys CT, Google CT)
+
+### 3.6 — Tor / Dark Web
+
+Limited takedown leverage for `.onion` services, but worth documenting:
+
+- Document via **Tor Project's abuse handling page** (limited leverage)
+- Contribute IOCs to **DarkOwl, Recorded Future, Flashpoint** (commercial dark-web monitoring) if you have access
+- Push to MISP with `tor` tag for community awareness
+
+### 3.7 — CSAM (Legally Separate Pathway)
+
+If CSAM is encountered during collection, **stop processing immediately**. CSAM has separate legal handling rules:
+
+- **NCMEC CyberTipline** (US)
+- **IWF (Internet Watch Foundation)** (UK)
+- **INHOPE** (international hotline network)
+- Possessing CSAM is illegal even for research; do not attempt to verify, document, or share. Report and delete from your systems under documented legal hold.
+
+This deserves a Section 12 with a hard stop: "if encountered, halt and report via the channels below; do not include in any other submission flow."
+
+---
+
+## 4. Missing Platforms Worth Adding (Quick List)
+
+### Free / Open
+- AlienVault OTX (huge omission)
+- ThreatFox
+- YARAify
+- CIRCL Hashlookup
+- CIRCL Passive DNS / Passive SSL
+- Maltrail feeds
+- crt.sh / Censys CT
+- GreyNoise community tier
+- Spamhaus DNSBL queries
+- PhishStats
+
+### Commercial / Paid (worth listing for completeness)
+- Recorded Future
+- Mandiant Advantage (now Google Threat Intelligence)
+- CrowdStrike Falcon Intelligence
+- Sekoia.io
+- Flashpoint
+- DomainTools (passive DNS / WHOIS history)
+- RiskIQ (now Microsoft Defender Threat Intelligence)
+- Anomali ThreatStream
+
+### Intelligence Communities (membership-based)
+- FIRST.org (CSIRT global community)
+- Trusted Introducer (European CSIRT trust framework)
+- M3AAWG (Messaging, Malware, Mobile Anti-Abuse Working Group)
+- APWG (Anti-Phishing Working Group)
+- Cyber Threat Alliance (commercial CTI sharing)
+- ENISA CSIRTs Network
+
+---
+
+## 5. Implementation Priorities for Blue48
+
+In our agent stack, this doc translates to concrete work:
+
+### 5.1 — Block G additions (when we get there)
+
+1. **`report_writer` agent** outputs the v2 normalized object (Section 1.9.1 above) as canonical format
+2. **New `routing_engine` component** (extension of `report_writer`, or a 7th agent) — consumes the object, applies routing matrix, fans out via API adapters
+3. **Adapter priority order for blue48 v1.0:**
+   1. MISP (PyMISP)
+   2. AlienVault OTX (REST)
+   3. AbuseIPDB (REST + category mapping)
+   4. URLhaus + MalwareBazaar + ThreatFox (shared abuse.ch auth-key)
+   5. urlscan.io (REST, with private-by-default visibility)
+   6. Cloudflare Abuse Reports
+   7. GPG-signed email to BSI / CERT-Bund (since the user is in DE)
+
+### 5.2 — Schema work
+
+- `config/submission_schema.json` — JSON Schema for the v2 normalized object
+- `config/routing_matrix.yaml` — declarative rules: evidence type → destinations, with TLP ceilings and quotas
+- `core/sanitize.py` — pre-submission scrubbing per destination policy
+- `core/audit.py` — append-only log of every submission, signed
+- `core/tlp.py` — TLP 2.0 enforcement
+
+### 5.3 — Pre-submission gates (before any adapter fires)
+
+```
+1. Schema valid?
+2. TLP <= destination ceiling?
+3. Sanitization complete?
+4. GreyNoise check passes (for IPs)?
+5. Quota available?
+6. Submitter identity registered with destination?
+7. Object signed?
+8. Audit row written?
+9. Human approver yes (for non-auto tier)?
+```
+
+If any fail → drop into human-review queue with the reason. Never silently skip.
+
+### 5.4 — Failure / retry layer
+
+- Per-destination idempotency keys (client-generated)
+- Exponential backoff with jitter
+- Dead-letter queue for permanent failures, surfaced in `data/dlq/`
+- Per-submitter quota tracking with auto-failover
+
+---
+
+## 6. Summary of v2 Findings
+
+| Category | Count | Action |
+|---|---:|---|
+| Section-by-section corrections | 38 | Fold into the draft |
+| New cross-cutting sections needed | 8 | Add as Sections 11–18 |
+| Missing platform categories | 7 | Each warrants a sub-section |
+| Missing free/open platforms (Tier 1) | 5 | Add to Section 2 |
+| Schema field gaps | 17 | Adopt v2 schema above |
+| Pre-submission gates not defined | 9 | Add as closing checklist |
+
+After folding these in, the document becomes a publishable internal whitepaper *and* a complete spec for the blue48 routing engine. The first draft was a confident outline; the v2 turns it into a working manual.
+
+If useful, I can next:
+
+- (a) Generate `config/submission_schema.json` (JSON Schema for the v2 normalized object) into `~/blue48/config/`
+- (b) Generate `config/routing_matrix.yaml` (declarative routing rules) into `~/blue48/config/`
+- (c) Scaffold `agents/routing_engine.py` with adapter stubs for the seven Block-G priority destinations
+- (d) Re-issue this review as suggested edits inline against the original (so you can accept/reject diff-style)
+
+Pick any subset and I'll ship.