Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
34 KiB
Detailed Review v2 — API-Eligible Cyber Threat Reporting & Escalation Platforms
Reviewer: Claude (Opus 4.7, 1M context)
Review date: 2026-05-13
Document reviewed: waypoints.md (first draft)
Companion to: waypoints_firstpass.md (v1 executive summary)
Scope of this v2: section-by-section findings, cross-cutting gaps, missing categories, revised schema, implementation priorities for blue48.
0. Method
I re-read the draft three times against the following lenses:
- Factual / API accuracy — does each platform actually do what's claimed?
- Operational correctness — would the routing actually work in practice, or break on first contact with reality?
- Legal / compliance — GDPR, NIS2, MLAT, jurisdiction, chain of custody
- Threat-model coverage — does this serve the actual project goal (campaign disruption, not individual attribution)?
- OPSEC of the reporter — what does the adversary learn from each submission?
Findings below carry confidence tags: [verified], [likely current], [verify before relying on].
1. Section-by-Section Findings
1.1 — Section 1: Recommended Reporting Order
1.1.1 In Scenario 1.1 (normal credible threat), going to the victim first is correct in 90% of cases — but flag the exception.
Insider-attack scenarios reverse this: notifying a victim org whose own admin/employee is the threat actor warns the attacker. For credential-leak cases involving privileged accounts, route CERT-first and let CERT decide whether to notify the victim org's leadership or its security contact. Add a 1.1.bis for "victim contact may itself be compromised."
1.1.2 Scenario 1.2 (imminent harm) is missing a specific decision point.
If the imminent harm is to critical infrastructure (energy, water, healthcare, finance), in EU jurisdictions the NIS2 Directive mandates 24-hour reporting from regulated entities. Your routing engine should detect "victim sector ∈ NIS2 essential/important entity list" and either:
- Route the report so the victim can fulfill their NIS2 obligation, OR
- (If victim is unreachable) report directly via the relevant national CERT's NIS2 channel, which exists separately from generic CSIRT contact paths
1.1.3 Scenario 1.3 missing receiver categories:
- Hosting providers (not just CDNs). Cloudflare is a CDN; the actual origin server is somewhere else (Hetzner, OVH, AWS, DigitalOcean, etc.). A Cloudflare-only report leaves the origin running. Add hosting provider abuse as a parallel step, not after CDN.
- Domain registrars via WHOIS-extracted abuse contact, plus registry escalation for ccTLDs (DENIC for
.de, AFNIC for.fr, EURid for.eu, Nominet for.uk) - Certificate authorities for compromised cert revocation (Let's Encrypt revoke API for ACME-issued certs; commercial CA abuse contacts for the rest)
- DNS providers independent of registrar (Cloudflare DNS, Quad9, Google Public DNS abuse contacts — for blocking, not takedown)
1.1.4 The implicit ordering bias.
The draft optimizes for legal-defensibility (talk to the receiver who can act) but doesn't optimize for operational speed-to-mitigation. For phishing kits with active credential harvesting, the fastest mitigation is often: parallel-fan-out to (CDN, hosting, registrar, browser-block-list providers) simultaneously, then notify CERT as record-keeping. The doc reads as serial when in practice it should be parallel.
1.2 — Section 2: Tier-1 API Reporting Platforms
1.2.1 Missing platforms that belong in Tier 1:
| Platform | Why Tier-1 | API style |
|---|---|---|
| abuse.ch ThreatFox | IOC graph, sibling to URLhaus/MalwareBazaar, accepts indicator submissions with kill-chain context | REST + Auth-Key |
| abuse.ch YARAify | YARA rule sharing + scanning. Direct fit since detection_author emits YARA |
REST + Auth-Key |
| AlienVault OTX (now LevelBlue Labs OTX) | One of the largest free CTI communities. Pulses for sharing, pull API for consumption. Major omission from current draft. | REST + DirectConnect API |
| CIRCL Hashlookup | Fast hash reputation lookup, free, EU-hosted | REST |
| Shadowserver | Free network exposure / vulnerability scanning reports. Subscribe by ASN/CIDR/contact. The draft has it under "monitoring" but Shadowserver also accepts submissions and runs important takedown campaigns. | REST API |
1.2.2 Reorder by jurisdictional fit:
The current #1 (CISA AIS) is US-government-tied. For Europe-focused work the right Tier-1 priorities are roughly:
- MISP (CIRCL communities, plus ENISA CSIRTs Network communities)
- OpenCTI (your own knowledge graph)
- AlienVault OTX (broad reach, low friction)
- CISA AIS (only if US-victim cases or US-relevant indicators)
- Cloudflare / hosting abuse APIs
- Spamhaus
- URLhaus / MalwareBazaar / ThreatFox
- AbuseIPDB
- urlscan.io
- Netcraft
1.2.3 Per-row corrections in the existing table:
- CISA AIS — "STIX/TAXII bidirectional" — be specific: STIX 2.1 over TAXII 2.1, with the AIS Profile (a restricted subset of STIX). Submitting non-AIS-Profile STIX gets rejected. [verified]
- Cloudflare Abuse Reports API — also requires noting that high-volume submitters can apply to be a Trusted Reporter which gets faster SLAs. [likely current]
- VirusTotal API — public submissions are visible to all VT Premium customers (incl. potentially the adversary). The draft doesn't flag this — it's a critical OPSEC point. Use VT Private Scanning for sensitive samples. [verified]
- PhishTank — community-vetted. As of late 2024 / early 2025 there were reports of reduced moderation activity. [verify before relying on]. Netcraft is the more reliable phishing-takedown channel today.
- Google Web Risk — access truly is gated by Google customer engineering review; not a 5-minute API key signup. Apply early. [verified]
1.3 — Section 3: Per-Platform Notes
3.1 CISA AIS: Add: requires sponsorship from a federal agency or a signed AIS Sharing Agreement, plus the connector software (typically TAXII client). Onboarding measured in weeks, not days. The draft makes it sound like a sign-up form.
3.2 MISP: Missing:
- ZeroMQ for real-time push (worth using if you want sub-second propagation to your own consumers)
- Distinction between events (point-in-time intelligence) and feeds (continuous streams; better for IOC bulk delivery)
- "Create a community" vs "Join a community" tradeoff — joining CIRCL's communities is the lowest-friction entry; creating your own is high-effort and pointless until you have multiple sharing partners
- TLP-marking enforcement is not automatic at the MISP level — your client must respect TLP before publishing onward
3.3 OpenCTI: Missing:
- The connector framework: ~80+ pre-built connectors (MITRE ATT&CK, MISP, CrowdStrike, Recorded Future, etc.) — most of your enrichment needs are already solved
- The Workbench feature for analyst review before publishing
- Filigran (the company behind OpenCTI) hosts a managed cloud version if you don't want to operate it yourself
3.4 Cloudflare Abuse Reports API: Missing:
- API token requires
Account.Abuse Reportspermission — won't work with read-only tokens - Rate limits documented separately from the abuse API itself
- For Cloudflare-hosted Workers (their serverless), abuse reports go to a different channel
- Trusted Reporter program (mentioned above) — apply once you have submission history
3.5 Spamhaus: Missing the lists distinction:
- DBL = Domain Block List (domains)
- SBL = Spamhaus Block List (IPs)
- XBL = Exploits Block List (exploit-sourced IPs)
- ZRD = Zero Reputation Domains (newly registered)
- Each list has different submission criteria. Wrong-list submissions get rejected. Your routing engine needs a list-selector.
3.6 AbuseIPDB: Missing:
- The 23-category taxonomy (
SSH brute force,port scan,web app attack,phishing, etc.) — your evidence type must map to an AbuseIPDB category code or the submission is low-utility - Free tier: 1000 reports/day, 100 IP checks/min. Paid tiers scale
- Single-reporter submissions have low weight; reputation requires multiple corroborating submitters. Send to AbuseIPDB after sending to other corroborators
3.7 URLhaus: Missing:
- Submission auth-key required (free, sign up)
- Manual review for high-confidence flags
- 2024+ stricter format requirements
- Linkage to MalwareBazaar — submit the URL to URLhaus, the sample to MalwareBazaar, link by hash
3.8 MalwareBazaar: Missing:
- File size limits (~250MB last I checked)
- Office macro / Windows installer formats need specific tags
- Tag taxonomy is community-driven; non-canonical tags reduce utility
- The "Avoid" line about legal-share is correct but vague. Specifically: do not upload samples obtained under NDA, samples from incidents where the victim hasn't consented, or samples that may contain victim PII (e.g., crafted payloads with the victim's name)
3.9 PhishTank: As noted above, declining. Verify status; consider deprioritizing.
3.10 urlscan.io: Missing:
- Visibility settings:
public,unlisted,private(private = paid) - Public scans are searchable by everyone — including the adversary monitoring for their kits being analyzed
- The Search API is invaluable for retrohunts: "show me every scan in the last 30 days that loaded resource X"
- Bulk submission via UUID-tagged
customagentfield for tracking your submission cohort
3.11 Google Web Risk: Missing:
- GCP project + Web Risk API enabled prerequisite
- Submissions evaluated by Google Safe Browsing pipeline; latency hours-to-days
- Successful submissions show up in Chrome / Firefox / Safari Safe Browsing warnings — massive amplification. Use only for high-confidence URLs
3.12 VirusTotal: Missing:
- Public API: 4 lookups/min, 500/day, 15.5k/month
- Premium API: rate limits negotiated
- File submission privacy: anyone with VT Intelligence can see your sample. Critical OPSEC point not in draft.
- VT Private Scanning for sensitive samples
- VT Hunting (YARA livehunt) for ongoing detection
3.13 Netcraft: Missing:
- Strong takedown-execution record — Netcraft actually does the takedown work, not just reporting
- Free tier exists for low-volume reporters
- Strongest at brand-protection / phishing
- They prefer evidence package format: source URL + screenshot + redirect chain + landing page HTML
1.4 — Section 4: Internal Case / Incident Routing Platforms
1.4.1 Missing platforms:
| Platform | Best for | Why missing matters |
|---|---|---|
| Wazuh | Open-source SIEM with TheHive integration | Many SOCs use it; integrates cleanly with this stack |
| Microsoft Sentinel | Cloud SIEM with Logic Apps automation | Major enterprise platform — leaving it out makes the doc feel non-enterprise |
| Splunk SOAR (formerly Phantom) | Commercial SOAR | Major in enterprise SOCs |
| Cortex XSOAR | Commercial SOAR (Palo Alto) | Same |
| Shuffle | Open-source SOAR | Free alternative to XSOAR/Phantom |
| Tracecat | Newer open-source SOAR | Younger but actively developed |
| n8n | General workflow automation | Not security-specific but widely used as a glue layer |
1.4.2 TheHive 5 vs 4: Be explicit — TheHive 4 reached EOL, TheHive 5 is current. Code examples should target TheHive 5 API.
1.5 — Section 5: Monitoring (Not Primary Reporting)
1.5.1 Missing high-value monitoring sources:
| Source | What it gives you | API |
|---|---|---|
| AlienVault OTX | Largest free pulse community, IOC subscriptions | REST DirectConnect |
| CIRCL Passive DNS / Passive SSL | Historical DNS / cert lookups; EU-hosted | REST |
| PhishStats | Phishing URL stream | REST + RSS |
| DNSDumpster / SecurityTrails / BinaryEdge | Recon/asset-discovery DBs | REST (mostly paid for bulk) |
| GreyNoise | Benign-scanner classification — reduces false positives in IP reporting by tagging known internet-noise sources | REST |
| Spamhaus DNSBL queries | Free DNSBL lookups | DNS protocol |
| Maltrail | Open-source malicious-traffic detection feeds | Static feed download |
| CT log monitors (crt.sh, Censys CT) | New-cert issuance for your monitored domains — catches phishing-domain registrations | REST |
1.5.2 GreyNoise specifically deserves a callout.
Reporting an IP that GreyNoise classifies as benign-scanner (Shodan, Censys, security researchers) gets you blacklisted from AbuseIPDB and embarrasses you with CERTs. Always GreyNoise-check before submitting an IP report. This is a one-line API call that prevents a class of bad submissions.
1.5.3 Shadowserver placement.
Currently in Section 5 (monitoring only) but Shadowserver also runs active sinkholing and takedown campaigns with global reach. They accept tip-offs and IOC contributions. Move them up to Tier 1 receivers, or at least call out the bidirectional relationship.
1.6 — Section 6: Practical Routing Matrix
1.6.1 Missing rows:
| Evidence type | First | Second | Internal |
|---|---|---|---|
| Compromised TLS certificate | CT log monitor sighting → CA revocation request | Cloudflare/host if cert is in use | OpenCTI / TheHive |
| Mobile app malware | Google Play / Apple App Review submission | VirusTotal sample upload | OpenCTI |
| Cryptocurrency wallet (laundering) | Chainalysis / TRM (commercial) or on-chain analysis | OFAC SDN if sanctioned | Internal restricted case |
| Open-source supply-chain attack | Registry security (security@npmjs.com, security@python.org) | GitHub Security Lab | OpenCTI / TheHive |
| Compromised GitHub repo / leaked secret | GitHub Security Advisory + vendor-specific revoke API (e.g., AWS IAM) | Victim org | Internal restricted |
| Tor hidden service hosting malware | Document only (no takedown for .onion); push IOCs to MISP | n/a | OpenCTI |
| Sanctions-evasion crypto | OFAC SDN reporting (US) / EU FSF reporting | National FIU | Internal restricted |
| CSAM (legally separate) | NCMEC CyberTipline (US) / IWF (UK) / INHOPE (international) | National police | Stop processing immediately, preserve under legal hold |
| Phishing-resistant kit / 2FA bypass | Browser vendor reports (Chrome / Firefox / Safari Trust & Safety) | Affected service | OpenCTI |
1.6.2 Cloudflare-proxied abuse needs a follow-up step.
Current row says: First → Cloudflare API; Second → Netcraft / PhishTank. Missing: Third → origin host abuse contact (extracted by sending Cloudflare a HEAD request that bypasses cache, or via certificate transparency cross-reference). Without this, takedown leaves the origin alive and the attacker just provisions a new CDN front-end.
1.6.3 The "Leaked credentials/API keys" row is dangerously thin.
"Victim first → CERT if severe → Internal IR case" — missing the revocation step, which is more time-critical than reporting. If you find a leaked AWS access key, the first action is aws iam delete-access-key via the affected account (with permission) or trigger AWS's automatic key-revocation by submitting to GitHub Secret Scanning. If you find leaked OAuth tokens for GitHub/Slack/etc., the relevant vendor has an automated revocation pathway. Add the revocation step before victim notification.
1.7 — Section 7: Minimum Viable API Stack
The current MVP list (10 items) is too heavy for "minimum viable." A genuine MVP for a new white-hat group is closer to:
- OpenCTI — your knowledge graph (or, if too heavy, just MISP for both)
- MISP via CIRCL community — free, EU-hosted, broad reach
- AlienVault OTX — free, broadest reach for indicator sharing
- AbuseIPDB — free tier, easy
- URLhaus + MalwareBazaar + ThreatFox (the abuse.ch trio — same auth-key, three destinations)
- urlscan.io — free tier, evidence generation
- National CERT direct email + GPG — non-API, but mandatory
That's 7 things, of which 5 are pure free signups. Tackle Cloudflare/Netcraft/Spamhaus/GoogleWebRisk after you have throughput in those 7.
The current MVP includes TheHive — that's case management, not external reporting. Move it out of "API stack" since it's internal infrastructure.
1.8 — Section 8: Data Handling Rules
1.8.1 "Never submit publicly" — additions:
- Insider-threat allegations without verification
- Attribution claims about specific named individuals (the hard line we settled on earlier)
- Government / classified material
- PHI (US HIPAA scope)
- PCI scope financial data
- Children's data (COPPA US; GDPR Article 8 EU)
- Biometric data
- Trade secrets / source code
- Material from unauthorized intrusion (even if you got to it via OSINT, "I downloaded their leaked DB" makes you a recipient of stolen goods in some jurisdictions)
1.8.2 "Safe to submit" — additions:
- YARA rules (especially to YARAify)
- Sigma rules (to SigmaHQ via PR)
- Mutex names, named-pipe signatures (good Sysmon detections)
- Persistence registry keys
- Scheduled task names
- TLS fingerprints (JA3, JA4)
- HTTP user-agent strings observed in C2
- ASN block ranges associated with adversary infrastructure
- STIX/TAXII patterns
- ATT&CK technique IDs (always)
1.8.3 Missing entire section: "Sanitize before submitting"
- Strip URL query parameters that may contain victim tokens / session IDs
- Hash email local-parts when target destination is public (
a72b91…@example.com) - Redact internal hostnames from samples
- Strip x-forwarded-for / source IP from log excerpts that name your honeypot
- Replace victim-org names with role descriptors (
<european_bank>) unless the submission is to a destination where the victim has consented or the receiver is trusted (CERT)
1.9 — Section 9: Recommended Submission Object
1.9.1 Schema gaps (additions in bold):
{
"case_id": "WG-2026-000001",
"schema_version": "1.0",
"tlp": "AMBER", // use TLP 2.0 values: CLEAR/GREEN/AMBER/AMBER+STRICT/RED
"tlp_marking_definition_ref": "marking-definition--...", // STIX-compatible
"severity": "low|medium|high|critical", // replace A-E with standard
"confidence": "low|medium|high", // or Admiralty A1-F6
"language": "en", // i18n
"first_observed": "2026-05-13T10:00:00Z", // top-level
"last_observed": "2026-05-13T11:30:00Z",
"valid_from": "2026-05-13T10:00:00Z", // STIX-style validity window
"valid_until": "2026-08-13T10:00:00Z",
"threat_type": "phishing|malware|ransomware|credential_exposure|iab|botnet|vulnerability_exploitation",
"victim": {
"organization": "",
"domain": "",
"country": "",
"sector": "",
"nis2_category": "essential|important|n/a", // for EU NIS2 routing
"consent_to_name_publicly": false // sanitization gate
},
"actor": {
"name": "Adira",
"aliases": [],
"campaign": "",
"confidence": "A1|A2|...|F6"
},
"kill_chain": ["recon|weapon|deliver|exploit|install|c2|action"],
"attack_techniques": ["T1566.001", "T1059.003"],
"source": {
"category": "forum|leak_site|telegram|honeypot|sensor|osint|tip",
"first_seen": "",
"last_seen": "",
"collection_method": "lawful_osint_or_partner_feed",
"burn_sensitivity": "low|medium|high" // affects sanitization aggressiveness
},
"observables": {
"ips": [],
"domains": [],
"urls": [],
"hashes": [],
"emails": [],
"wallets": [],
"cves": [],
"yara_rules": [],
"sigma_rules": [],
"mutexes": [],
"named_pipes": [],
"scheduled_tasks": [],
"registry_keys": [],
"user_agents": [],
"tls_fingerprints": [], // JA3/JA4
"certificates": [], // CT log entries / SHA256 of cert
"asn_blocks": [],
"process_names": []
},
"pattern_relationships": [
{"source": "domain:example.com", "type": "resolves_to", "target": "ipv4:1.2.3.4", "first_seen": "..."}
],
"evidence": {
"summary": "",
"sanitized_screenshots": [],
"raw_evidence_location": "internal_restricted_storage",
"detonation_results": [], // sandbox report references
"memory_artifacts": [] // forensic, internal only
},
"timeline": [
{"ts": "...", "event": "..."}
],
"indicators_of_compromise": [], // observables flagged as actively malicious
"recommended_actions": [],
"routing": {
"primary_destinations": [],
"secondary_destinations": [],
"public_disclosure_allowed": false,
"embargo_until": null, // timed disclosure
"coordinated_with": [] // who else has been told (CERT case IDs etc)
},
"audit": {
"submitted_to": [], // append-only history of submissions
"feedback_received": [], // ack IDs, takedown confirmations
"submitter_identity": "wg-handle@misp", // which submitter handle was used
"signed_with": "PGP fingerprint",
"object_sha256": "" // tamper-detect on the object itself
}
}
1.9.2 Other schema concerns:
case_idformatWG-2026-000001is fine, but reserve a 2-char org prefix to avoid collision if you ever federate with another working grouptlpshould use TLP 2.0 spec values (CLEAR,GREEN,AMBER,AMBER+STRICT,RED) — TLP 1.0 used different terms- Severity / confidence mismatch in v1: severity used
A-E, confidence used words. Standardize. - Add a per-object hash so the routing engine can detect tampering between produce-time and submit-time
1.10 — Section 10: Final Recommendation
1.10.1 The architecture sentence is missing the feedback edge.
Current: Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting
Better: Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting → receipts and outcomes back to OpenCTI → effectiveness scoring → re-prioritization
Without the feedback edge, you can't tell which destinations are worth maintaining.
1.10.2 Missing entirely: closing checklist for "we're ready to submit."
A final checklist before any external submission fires:
[ ] TLP enforced (object.tlp <= destination.max_tlp)
[ ] Sanitization pass complete (PII stripped per destination policy)
[ ] GreyNoise check (if observables include IPs)
[ ] Quota available (rate-limit budget not exceeded)
[ ] Submitter identity registered with destination
[ ] Object signed
[ ] Audit row written
[ ] Human approver clicked yes (for non-automated tier)
This belongs as Section 11 or as the closing block of Section 10.
2. Cross-Cutting Gaps (Not Tied to Any Section)
2.1 — OPSEC for the Reporters Themselves
Not in the doc at all. If your group is reporting Adira to authorities, Adira may notice — they read MISP communities (those that are open), they read URLhaus (public), and they have visibility into VirusTotal Premium (paid customer).
Required additions:
- Submission identity registry: which handle is used on which platform, who has access, rotation schedule
- Account-creation OPSEC: don't use personal accounts on submission platforms; create a project handle, use a project email, register with project-owned phone/2FA
- Network OPSEC for collection: if you're scraping leak sites or monitoring the adversary's infrastructure, route through a VPN or research-purpose proxy — never the same network as your submission identity
- PGP for CERT comms: every national CERT publishes a PGP key. Every email submission to a CERT should be signed and encrypted. Untouched in the draft.
2.2 — Burnt Source Protection
If you have a private collection source (honeypot, infiltrated channel, tipped-off insider), publishing IOCs from it can burn the source. Specifically:
- A unique honeypot fingerprint (banner, response timing, listening port) lets the adversary identify which sample came from your honeypot
- Publishing a sample with a unique build artifact (your sandbox's hostname in a DNS query, a timestamp matching your detonation window) reveals your detonation infrastructure
- Reporting a forum URL while it's still live tips off the forum operator that it's being watched
The doc needs a burn-sensitivity tier on each observable, and a sanitization step that aggressively scrubs source-identifying artifacts before any external submission.
2.3 — Adversary Observability of Your Submissions
Tier each receiver by who can see your submission:
| Receiver | Adversary visibility |
|---|---|
| MISP private community | trusted community only |
| MISP public community / OTX public pulse | anyone with an account |
| URLhaus | public — adversary can monitor |
| MalwareBazaar | public — adversary can detect their sample was uploaded |
| VirusTotal public submission | every VT Premium customer (incl. potentially adversary) |
| VT Private Scanning | only your team (paid) |
| AbuseIPDB | public reputation visible |
| Cloudflare Abuse Reports | only Cloudflare and the reported asset owner |
| CERT direct (GPG-encrypted) | only the CERT |
The routing engine should display this visibility for each destination during human review.
2.4 — Chain of Custody / Legal Admissibility
If any of this material may end up in a criminal proceeding, chain of custody matters. Specifically:
- The raw evidence must be preserved unmodified, with hashes recorded at acquisition time
- Any transformation (sanitization, normalization) must be reversible — the routing engine logs the input hash, the transform applied, and the output hash
- The submitter identity for each external submission is logged
- Witnesses (multi-party access logs) are preferred for high-value evidence
The current evidence.raw_evidence_location field is a placeholder; it needs structure: storage path, hash, acquisition timestamp, acquirer identity.
2.5 — Amplification Risk
Publishing IOCs publicly amplifies awareness — which is good for defenders but bad if:
- The IOC includes a compromised legitimate site (you damage the site owner's reputation)
- The IOC is for a piece of infrastructure that's about to be used in a sting operation by LE
- The IOC reveals an investigation technique still under embargo
A publish-readiness review belongs in Section 1 of the doc, not in the closing checklist.
2.6 — Failure Modes / Retries
What happens when:
- URLhaus rejects a submission (malformed, low-confidence flag, duplicate)?
- MISP is down for maintenance?
- Cloudflare returns 503?
- Your submitter identity gets rate-limited?
- An API token is revoked mid-batch?
The doc has no resilience layer. Recommend:
- Idempotent submission with client-generated IDs (so retries don't double-submit)
- Per-destination retry policy (exponential backoff with jitter)
- Dead-letter queue for permanent failures — surface in human-review UI
- Per-submitter quota tracking, with auto-failover to backup submitter if available
2.7 — Versioning and Maintenance
The doc has no version number, no changelog, no maintainer field, no review cadence. For a living spec like this:
---
schema_version: 1.0
last_reviewed: 2026-05-13
next_review_due: 2026-08-13
maintainer: <project lead>
changelog:
- 2026-05-13: initial draft
---
API surfaces of these platforms change (Cloudflare deprecations, VT pricing changes, abuse.ch tag taxonomy updates). A quarterly re-validation cadence is sane.
2.8 — Multi-Language Submissions
Many national CERTs prefer or require local language for narrative fields (BSI German, ANSSI French, CCN-CERT Spanish). The submission object's language field (added above) plus a translation step in the routing engine handles this. Currently absent.
3. Missing Categories Entirely
3.1 — Hosting Provider Abuse Channels (most lack true REST APIs)
| Provider | Channel | API? |
|---|---|---|
| AWS | abuse@amazonaws.com + form | No public REST; AWS responds to email |
| Google Cloud | https://support.google.com/cloud/answer/2417620 | Form-only |
| Azure | https://msrc.microsoft.com/report/abuse | Form + email |
| DigitalOcean | abuse@digitalocean.com | Email + status REST |
| Hetzner | abuse@hetzner.com + form | Form |
| OVH | abuse@ovh.net + Anti-abuse REST API | Yes |
| Linode (Akamai) | abuse@linode.com | |
| Vultr | abuse@vultr.com |
Treat email-based providers as a different submission class (template + GPG-signed email, with parsed-receipt detection). Worth a Section 11 in the doc.
3.2 — Cryptocurrency / Sanctions
- Chainalysis Reactor — commercial, gold standard for on-chain investigations
- TRM Labs — commercial alternative
- CipherTrace (Mastercard) — commercial
- OFAC SDN reporting — for US-sanctioned wallets
- EU Financial Sanctions Files (FSF) — for EU sanctions
- National FIUs — Financial Intelligence Units, country-specific
- Free / open: GraphSense (open-source on-chain analytics), Etherscan (manual)
3.3 — Mobile / App Store
- Google Play Protect submissions (for Android malware)
- Apple App Review report (for malicious iOS apps)
- APKMirror reports (for repackaged apps)
- F-Droid security contacts (for compromised FOSS apps)
3.4 — Open-Source Supply Chain
- PyPI: security@python.org
- npm: security@npmjs.com + vendored auto-revoke for leaked tokens
- crates.io: help@crates.io
- RubyGems: security@rubygems.org
- Maven Central: central@sonatype.org
- GitHub Security Lab (research collaboration)
- OpenSSF Vulnerability Disclosure (cross-ecosystem coordination)
- Sigstore (provenance verification, longer-term)
3.5 — Certificate Authorities
- Let's Encrypt: ACME revocation API for ACME-issued certs
- Sectigo / DigiCert / GlobalSign / Entrust: abuse contacts in CA/Browser Forum compliance docs
- CT log monitors for detection (crt.sh, Censys CT, Google CT)
3.6 — Tor / Dark Web
Limited takedown leverage for .onion services, but worth documenting:
- Document via Tor Project's abuse handling page (limited leverage)
- Contribute IOCs to DarkOwl, Recorded Future, Flashpoint (commercial dark-web monitoring) if you have access
- Push to MISP with
tortag for community awareness
3.7 — CSAM (Legally Separate Pathway)
If CSAM is encountered during collection, stop processing immediately. CSAM has separate legal handling rules:
- NCMEC CyberTipline (US)
- IWF (Internet Watch Foundation) (UK)
- INHOPE (international hotline network)
- Possessing CSAM is illegal even for research; do not attempt to verify, document, or share. Report and delete from your systems under documented legal hold.
This deserves a Section 12 with a hard stop: "if encountered, halt and report via the channels below; do not include in any other submission flow."
4. Missing Platforms Worth Adding (Quick List)
Free / Open
- AlienVault OTX (huge omission)
- ThreatFox
- YARAify
- CIRCL Hashlookup
- CIRCL Passive DNS / Passive SSL
- Maltrail feeds
- crt.sh / Censys CT
- GreyNoise community tier
- Spamhaus DNSBL queries
- PhishStats
Commercial / Paid (worth listing for completeness)
- Recorded Future
- Mandiant Advantage (now Google Threat Intelligence)
- CrowdStrike Falcon Intelligence
- Sekoia.io
- Flashpoint
- DomainTools (passive DNS / WHOIS history)
- RiskIQ (now Microsoft Defender Threat Intelligence)
- Anomali ThreatStream
Intelligence Communities (membership-based)
- FIRST.org (CSIRT global community)
- Trusted Introducer (European CSIRT trust framework)
- M3AAWG (Messaging, Malware, Mobile Anti-Abuse Working Group)
- APWG (Anti-Phishing Working Group)
- Cyber Threat Alliance (commercial CTI sharing)
- ENISA CSIRTs Network
5. Implementation Priorities for Blue48
In our agent stack, this doc translates to concrete work:
5.1 — Block G additions (when we get there)
report_writeragent outputs the v2 normalized object (Section 1.9.1 above) as canonical format- New
routing_enginecomponent (extension ofreport_writer, or a 7th agent) — consumes the object, applies routing matrix, fans out via API adapters - Adapter priority order for blue48 v1.0:
- MISP (PyMISP)
- AlienVault OTX (REST)
- AbuseIPDB (REST + category mapping)
- URLhaus + MalwareBazaar + ThreatFox (shared abuse.ch auth-key)
- urlscan.io (REST, with private-by-default visibility)
- Cloudflare Abuse Reports
- GPG-signed email to BSI / CERT-Bund (since the user is in DE)
5.2 — Schema work
config/submission_schema.json— JSON Schema for the v2 normalized objectconfig/routing_matrix.yaml— declarative rules: evidence type → destinations, with TLP ceilings and quotascore/sanitize.py— pre-submission scrubbing per destination policycore/audit.py— append-only log of every submission, signedcore/tlp.py— TLP 2.0 enforcement
5.3 — Pre-submission gates (before any adapter fires)
1. Schema valid?
2. TLP <= destination ceiling?
3. Sanitization complete?
4. GreyNoise check passes (for IPs)?
5. Quota available?
6. Submitter identity registered with destination?
7. Object signed?
8. Audit row written?
9. Human approver yes (for non-auto tier)?
If any fail → drop into human-review queue with the reason. Never silently skip.
5.4 — Failure / retry layer
- Per-destination idempotency keys (client-generated)
- Exponential backoff with jitter
- Dead-letter queue for permanent failures, surfaced in
data/dlq/ - Per-submitter quota tracking with auto-failover
6. Summary of v2 Findings
| Category | Count | Action |
|---|---|---|
| Section-by-section corrections | 38 | Fold into the draft |
| New cross-cutting sections needed | 8 | Add as Sections 11–18 |
| Missing platform categories | 7 | Each warrants a sub-section |
| Missing free/open platforms (Tier 1) | 5 | Add to Section 2 |
| Schema field gaps | 17 | Adopt v2 schema above |
| Pre-submission gates not defined | 9 | Add as closing checklist |
After folding these in, the document becomes a publishable internal whitepaper and a complete spec for the blue48 routing engine. The first draft was a confident outline; the v2 turns it into a working manual.
If useful, I can next:
- (a) Generate
config/submission_schema.json(JSON Schema for the v2 normalized object) into~/blue48/config/ - (b) Generate
config/routing_matrix.yaml(declarative routing rules) into~/blue48/config/ - (c) Scaffold
agents/routing_engine.pywith adapter stubs for the seven Block-G priority destinations - (d) Re-issue this review as suggested edits inline against the original (so you can accept/reject diff-style)
Pick any subset and I'll ship.