Files
psyc/docs/dossier.md
m17hr1l e04c6c96d8 init: scaffold psyc — defensive CTI routing & evidence-sealing platform
Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence,
URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail),
flat Typer CLI, Result[T, E] type module, structlog config.
Architecture in docs/dossier.md; 12-fold style guide in docs/style.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 12:43:47 +02:00

120 KiB
Raw Blame History

Blue48 / Adira Hunt — Consolidated Dossier

Compiled: 2026-05-13

Auto-merged from the individual source files in this directory. Each source file remains authoritative; this is for single-pane reading.

Architecture sentence

Sensors
→ Scoutline
→ Proofline
→ Mapline
→ Classifyline
→ Sealine
→ Routeline
→ Ledgerline
→ Publishline
→ Trainline
→ Blue48 Operations Cockpit

Core principle

Validate the signal, protect the evidence, route only what each destination is authorized to receive, and prove every external action through an immutable ledger.

Contents

  • Blue48 Reporting and API Escalation Architecture v2 (from routeline.md)
  • Blue48 Worker Mesh Architecture (from hivemap.md)
  • Blue48 IntelMiner and LoRA Training Data Pipeline (from intelminer.md)
  • Blue48 Operations Cockpit — GUI / UI-UX Concept (from blue48_operations_cockpit_ui_ux.md)
  • API-Eligible Cyber Threat Reporting & Escalation Platforms (from waypoints.md)
  • Review — API-Eligible Cyber Threat Reporting & Escalation Platforms (Draft v1) (from waypoints_firstpass.md)
  • Detailed Review v2 — API-Eligible Cyber Threat Reporting & Escalation Platforms (from waypoints_scalpel.md)

Blue48 Reporting and API Escalation Architecture v2

Source: routeline.md

Document type: Project record / operational architecture
Scope: API-eligible reporting platforms, routing order, evidence handling, CERT mapping, abuse routing, receipts, and audit controls
Status: Draft v2


1. Purpose

This document defines how Blue48 routes defensive cyber-intelligence to the correct recipients using structured APIs, trusted communities, CERT/CSIRT channels, abuse-reporting endpoints, and authority-sealed evidence packages.

The platform is designed for lawful white-hat operations. It should not amplify stolen data, expose victims prematurely, or interact with criminal actors.

Core principle:

Validate the signal, protect the evidence, route only what each destination is authorized to receive, and prove every external action through an immutable ledger.


Normal cases

Victim Security Team
→ National CERT / CSIRT
→ Sector ISAC / trusted community
→ Law enforcement cyber unit when criminal evidence exists
→ Provider / registrar / abuse APIs
→ Public sanitized report after mitigation

Imminent harm or critical infrastructure

National CERT / CSIRT
→ Victim Security Team
→ Law enforcement cyber unit
→ Sector ISAC / regulator
→ Provider / registrar / abuse APIs
→ Trusted CTI community
→ Public sanitized report after mitigation or clearance

Malicious infrastructure

Hosting provider / CDN / cloud abuse desk
→ Registrar abuse contact
→ Registry escalation
→ National CERT / CSIRT
→ Law enforcement when warranted
→ Trusted CTI community

Mass exploitation

Affected vendor
→ National CERT / CSIRT
→ Affected sectors / ISACs
→ MISP / trusted CTI community
→ Public advisory after coordinated mitigation

3. Authority-Sealed Evidence Handling

Blue48 does not treat full evidence protection as “sanitization.” Sensitive evidence should be preserved, encrypted, and routed only to authorized recipients.

Use the term:

Authority-Sealed Evidence Handling

Purpose:

  • preserve high-value evidence
  • prevent uncontrolled internal access
  • prevent accidental redistribution
  • allow victims or authorities to decrypt when authorized
  • destroy local plaintext and unwrapped keys after successful sealing

4. Evidence Protection Models

Model A: Authority public-key encryption

Authorities or victims provide public encryption keys.

Evidence collected
→ sensitive evidence packaged
→ package encrypted with authority/victim public key
→ encrypted package submitted
→ local plaintext destroyed
→ only recipient can decrypt

This is the cleanest model because Blue48 never holds the recipient private key.

Model B: One-time evidence key wrapped for recipients

Generate random evidence key
→ encrypt evidence with evidence key
→ wrap evidence key to recipient public keys
→ submit encrypted package
→ destroy local plaintext
→ destroy local unwrapped evidence key

Package example:

{
  "evidence_package_id": "uuid",
  "encrypted_evidence": "ciphertext-reference",
  "wrapped_keys": [
    {
      "recipient": "CERT-Bund",
      "key_id": "authority-key-id",
      "wrapped_key": "encrypted-evidence-key"
    },
    {
      "recipient": "Victim Security Team",
      "key_id": "victim-key-id",
      "wrapped_key": "encrypted-evidence-key"
    }
  ],
  "metadata": {
    "tlp": "AMBER",
    "severity": "critical",
    "created_at": "2026-05-13T00:00:00Z",
    "retention_policy": "plaintext destroyed after encryption"
  }
}

5. Destination Minimization

Authority-sealed evidence handling does not mean every platform receives full evidence. Public and semi-public APIs should receive only the minimum necessary payload.

Destination type Payload
CERT / law enforcement Encrypted full evidence package when authorized.
Victim security team Encrypted full or partial evidence package.
Trusted MISP community TLP-filtered STIX indicators and context.
Provider / registrar abuse API Minimal abuse report with infrastructure evidence.
URLhaus / MalwareBazaar Malware URL/hash/sample only when legally allowed.
AbuseIPDB IP, category, timestamp, short comment.
VirusTotal Hash, URL, or sample only when policy allows.
Public report Sanitized narrative, no raw sensitive evidence.

6. Priority Platform Order

For European or global operations, use MISP and national CERT routing before US-specific AIS.

Priority Platform / route Role
1 MISP / CIRCL / trusted MISP communities Primary CTI sharing backbone.
2 National CERT / CSIRT Country-specific authority route.
3 CERT-EU / ENISA CSIRTs Network EU institutions and European coordination.
4 CISA AIS US-relevant machine-to-machine indicator sharing.
5 OpenCTI Internal graph and knowledge base, not necessarily an external reporting destination.
6 Provider / CDN / cloud abuse APIs Infrastructure mitigation and takedown.
7 Registrar / registry abuse channels Domain suspension or escalation.
8 abuse.ch, URLhaus, MalwareBazaar Malware URL/sample ecosystem reporting.
9 AbuseIPDB / Spamhaus / PhishTank / urlscan.io Public/semi-public abuse and phishing ecosystem reporting.
10 Public advisory channels Sanitized reporting after mitigation.

7. CERT / CSIRT Routing Map

The routing engine should pick receivers based on victim country, sector, and legal jurisdiction.

Country / region Receiver Channel type
Germany BSI / CERT-Bund Structured email, trusted channels, MISP community where available.
France ANSSI / CERT-FR CERT channel, structured reporting, TAXII/MISP where available.
United Kingdom NCSC-UK Structured reporting, early-warning services, official channels.
Netherlands NCSC-NL CERT channel, trusted community/MISP where available.
Spain CCN-CERT / INCIBE-CERT Public-sector/private-sector split, CERT channels, MISP where available.
EU institutions CERT-EU EU institutional route.
EU criminal coordination Europol EC3 Usually via national CERT/law-enforcement channels, not first-call technical sharing.
United States CISA / FBI IC3 / FBI field office CISA for technical reporting, IC3/FBI for crime reporting.

Implementation note:

Europol EC3 should not be treated as the first technical receiver. Route through the relevant national CERT or law-enforcement channel first unless a formal coordination channel exists.


8. Severity and Class Mapping

Blue48 may keep an internal class model, but outbound reports should include standard severity.

Internal class Meaning External severity
A Imminent harm / attack likely underway Critical
B Credible planned attack High
C Confirmed exposure High / Medium
D Campaign intelligence Medium / High
E Weak signal / watchlist Low / Monitor

9. Normalized Case Object

All workers should read and write the same normalized case object.

{
  "case_id": "B48-2026-000001",
  "summary": "Short defensive summary",
  "classification": {
    "class": "A | B | C | D | E",
    "severity": "low | medium | high | critical",
    "tlp": "RED | AMBER | GREEN | CLEAR",
    "incident_type": "access_sale | ransomware | credential_leak | phishing | malware | exploit | botnet | data_leak"
  },
  "confidence": {
    "level": "low | medium | high",
    "source_reliability": "A | B | C | D | E | F | unknown",
    "information_credibility": "1 | 2 | 3 | 4 | 5 | 6 | unknown"
  },
  "victim": {
    "name": "",
    "domain": "",
    "country": "",
    "sector": "",
    "critical_infrastructure": false
  },
  "actor": {
    "name": "",
    "aliases": [],
    "campaign": "",
    "confidence": "low | medium | high"
  },
  "observables": {
    "domains": [],
    "ips": [],
    "urls": [],
    "hashes": [],
    "cves": [],
    "wallets": [],
    "emails": []
  },
  "evidence": {
    "raw_evidence_location": "internal-only-reference",
    "sealed_package_id": "",
    "payload_hash": "",
    "plaintext_destroyed": false,
    "local_unwrapped_key_destroyed": false
  },
  "routing": {
    "recommended_routes": [],
    "blocked_routes": [],
    "human_approval_required": true
  }
}

10. TLP Enforcement

Every destination must define a maximum allowed TLP.

Routing precondition:

submission.tlp <= destination.max_tlp_allowed

Example destination policy:

Destination Max TLP Payload type
CERT / CSIRT trusted route RED or AMBER depending channel Sealed evidence package.
Victim security team RED or AMBER depending identity verification Sealed package or controlled extract.
MISP trusted community AMBER or GREEN depending sharing group STIX/MISP event.
Public MISP community GREEN or CLEAR Public-safe indicators.
AbuseIPDB CLEAR Minimal IP abuse report.
URLhaus CLEAR / GREEN depending policy Malicious URL report.
VirusTotal CLEAR only unless legal approval exists Hash/URL/sample where permitted.
Public advisory CLEAR Sanitized intelligence.

11. API-Eligible Destination Categories

11.1 CTI sharing

Platform Purpose Integration style
MISP Threat intelligence sharing and communities. REST API / PyMISP / STIX.
OpenCTI Internal CTI graph and knowledge management. GraphQL API / STIX.
CISA AIS US-relevant automated indicator sharing. TAXII/STIX-style exchange.

11.2 Abuse and takedown

Platform Purpose Integration style
Cloudflare Abuse Reports Report abuse behind Cloudflare services. Abuse Reports API / portal.
Registrar abuse channels Domain abuse escalation. API where available, otherwise structured email/WHOIS abuse contact.
Registry escalation Escalation for TLD-level issues. Registry-specific process.
URLhaus Malware URL reporting. API submission.
MalwareBazaar Malware sample/hash ecosystem. API submission/query.
AbuseIPDB IP reputation and abuse reports. API.
Spamhaus Spam, botnet, and malicious infrastructure reporting. Submission portal/API where available.
PhishTank Phishing URL reporting. API/community workflow.
urlscan.io URL scan and malicious page evidence. API submission.
Google Web Risk Unsafe URL submission where permitted. Restricted API.
VirusTotal URL/file/hash enrichment and submission. API, policy controlled.
Netcraft Phishing and abuse reporting. API/enterprise options and reporting channels.

11.3 Internal case-management

Platform Purpose
TheHive Security case management and observables.
DFIR-IRIS Incident response case management.
ServiceNow SIR Enterprise incident response workflow.
Jira Service Management Case routing and task management.

12. Registrar and Registry Abuse Flow

For malicious domains, phishing portals, C2 domains, impersonation infrastructure, and ransomware-related web infrastructure:

Identify domain
→ identify hosting provider and CDN/proxy
→ identify registrar from WHOIS/RDAP
→ report to hosting/CDN abuse
→ report to registrar abuse
→ escalate to registry if registrar fails or emergency applies
→ notify CERT if critical or cross-border

Payload should include:

  • domain
  • abuse type
  • timestamp
  • evidence hash
  • screenshot hash or sealed evidence reference
  • safe reproduction summary
  • victim impersonated, if relevant
  • requested action

Avoid sending raw credentials, stolen data, or private victim details to registrar/registry channels unless legally justified.


13. Rate Limits and Queueing

Every destination should define a quota object.

{
  "destination": "AbuseIPDB",
  "quota": {
    "limit": 1000,
    "period": "day",
    "priority_policy": "critical_first",
    "backoff": "exponential"
  }
}

RateLimiter responsibilities:

  • prevent dropped submissions
  • queue low-priority submissions during bursts
  • reserve budget for critical cases
  • retry transient failures
  • record rate-limit errors in Ledger

14. Receipt and Effectiveness Tracking

ReceiptCollector and Watcher should capture feedback from every destination.

Receipt schema:

{
  "case_id": "B48-2026-000001",
  "destination": "Cloudflare Abuse API",
  "submitted_at": "2026-05-13T00:00:00Z",
  "acknowledged_at": "2026-05-13T00:05:00Z",
  "receipt_id": "external-case-id",
  "status": "submitted | acknowledged | rejected | actioned | closed",
  "outcome": "pending | takedown_confirmed | duplicate | no_action | escalated"
}

Success metrics:

Destination type Success metric
CERT / CSIRT Acknowledgement, case opened, mitigation guidance issued.
Provider / registrar Infrastructure suspended, blocked, or investigated.
MISP Event accepted, sightings, correlations.
URLhaus / MalwareBazaar URL/sample accepted, classified, distributed.
Public report Defenders consume advisory, no sensitive data leak.

15. Immutable Audit Log

Every external submission or destructive action must write an immutable record.

Audit row:

(timestamp, case_id, destination, payload_hash, submitter_identity, tlp, response_id, outcome)

Also audit:

  • evidence sealing
  • recipient key addition/removal
  • plaintext destruction
  • local key destruction
  • route approval
  • route blocking
  • public publication
  • dataset export
  • policy modification

16. Public Reporting Rules

Public reports may include:

  • sector trend
  • country/region if safe
  • actor or campaign if already public or properly attributed
  • TTPs
  • CVEs
  • defensive recommendations
  • sanitized IOCs
  • non-sensitive timeline

Public reports must not include:

  • raw credentials
  • stolen data
  • direct links to stolen data
  • live access details
  • internal screenshots
  • private victim communications
  • exact criminal-source links
  • exploit instructions
  • anything that increases victim harm before mitigation

17. Initial Adapter Build Order

Recommended Block G adapter priority:

  1. MISP via PyMISP
  2. AbuseIPDB
  3. URLhaus
  4. Cloudflare Abuse Reports
  5. urlscan.io
  6. MalwareBazaar
  7. Registrar abuse structured email/RDAP helper
  8. VirusTotal enrichment/submission with strict policy guard
  9. OpenCTI internal graph integration
  10. TheHive or DFIR-IRIS case export

These cover the most common evidence and routing cases while keeping legal risk manageable.


18. Secrets Handling

Every adapter needs credentials.

Rules:

  • credentials live in .env or secret manager
  • credentials are injected at runtime
  • credentials are never baked into container images
  • credentials are rotatable
  • credentials are scoped per adapter
  • every API call writes to the Ledger
  • failed authentication events are logged and alerted

19. Summary

The v2 architecture changes the platform from a list of reporting sites into an operational routing system.

The most important revisions are:

  • MISP and national CERTs are prioritized over CISA AIS for European/global work.
  • CERT routing is country-specific.
  • Registrar and registry abuse flows are included.
  • Sensitive evidence is protected through authority-sealed encryption, not casual sanitization.
  • Public and semi-public APIs receive minimized payloads only.
  • TLP enforcement, rate limits, receipts, and immutable audit logs are mandatory.

Blue48 Worker Mesh Architecture

Source: hivemap.md

Document type: Project record / technical architecture
Scope: Worker names, responsibilities, interfaces, data flow, human review boundaries
Status: Draft v1


1. Purpose

Blue48 should not rely on one large, expensive, opaque model to perform all cyber-intelligence operations. The platform should be built as a mesh of small, specialized workers.

Each worker performs one narrow function, writes structured output, and passes a normalized case object to the next stage. Heavy models are reserved for judgment-heavy tasks such as confidence scoring, routing explanations, public report drafting, and training-example generation.

Core principle:

Small workers produce traceable outputs. Humans approve sensitive decisions. The Ledger proves what happened.


2. High-Level Flow

Scoutline
→ Proofline
→ Mapline
→ Classifyline
→ Sealine
→ Routeline
→ Ledgerline
→ Publishline
→ Trainline

Operator version:

Detect → Validate → Map → Classify → Seal Evidence → Route → Submit → Track → Archive → Learn

3. Worker Lines

Line Purpose
Scoutline Finds, fetches, parses, and deduplicates lawful intelligence sources.
Proofline Validates claims, checks indicators, measures freshness, and scores confidence.
Mapline Resolves victims, actors, sectors, jurisdictions, CERT routes, and affected products.
Classifyline Assigns severity, TLP, incident type, and operational class.
Sealine Packages evidence, encrypts it for authorized recipients, and destroys local plaintext/key material when policy allows.
Routeline Selects destinations, builds payloads, enforces destination policy, and submits reports.
Ledgerline Records immutable audit events, receipts, outcomes, and follow-up status.
Publishline Produces sanitized public intelligence only after mitigation and approval.
Trainline Converts lawful, reviewed intelligence into LoRA-ready training data.

4. Core Worker Set

The first conceptual worker set is:

Scout → Verifier → Mapper → Classifier → Sealer → Router → Courier → Ledger

Support workers:

Watcher → Archivist → Publisher

Operational sentence:

Scout detects.
Verifier confirms.
Mapper identifies.
Classifier prioritizes.
Sealer protects.
Router decides.
Courier submits.
Ledger proves.
Watcher follows up.
Archivist forgets safely.
Publisher informs.

5. Granular Worker Breakdown

5.1 Scoutline

Worker Job Model requirement
SourcePlanner Maintains the approved source list, collection schedules, and source eligibility. None / rules
Crawler Discovers new pages, feeds, advisories, reports, APIs, and datasets. None
Fetcher Downloads pages, PDFs, JSON, RSS, STIX/TAXII, MISP events, and API responses. None
Parser Extracts title, date, author, body, tables, indicators, and metadata. Rules / small model
Deduper Detects duplicate reports, reposted IOCs, syndicated articles, and repeated claims. Embeddings / rules
SourceRanker Scores the source based on trust, history, origin, and license status. Rules / small model
Signalizer Converts parsed content into candidate intelligence signals. Small/medium model

Output:

{
  "signal_id": "uuid",
  "source_type": "advisory | cti_report | abuse_feed | ransomware_monitor | public_blog | misp_event",
  "summary": "short defensive summary",
  "observed_at": "2026-05-13T00:00:00Z",
  "raw_evidence_location": "internal-only-reference"
}

5.2 Proofline

Worker Job
Correlator Checks whether the same signal appears across multiple independent sources.
IOCChecker Validates domains, IPs, hashes, URLs, wallet addresses, emails, and CVEs.
FreshnessChecker Determines whether the signal is current, stale, repeated, or resurfaced.
ClaimChecker Labels language as confirmed, claimed, observed, rumored, or speculative.
ConfidenceScorer Produces final confidence and optional Admiralty Code values.

Output:

{
  "confidence": "low | medium | high",
  "source_reliability": "A | B | C | D | E | F | unknown",
  "information_credibility": "1 | 2 | 3 | 4 | 5 | 6 | unknown",
  "claim_status": "confirmed | claimed | observed | rumored | speculative",
  "freshness": "new | recent | stale | resurfaced"
}

5.3 Mapline

Worker Job
EntityResolver Maps organization names, domains, subsidiaries, brands, and aliases.
GeoResolver Maps victim country, jurisdiction, national CERT, and cross-border implications.
SectorMapper Maps victim sector and critical-infrastructure status.
ActorMapper Maps actor names, aliases, ransomware brands, campaigns, and confidence.
CVEResolver Maps vulnerabilities to CVEs, affected products, KEV status, and exploit relevance.

Output:

{
  "victim": {
    "name": "",
    "domain": "",
    "country": "",
    "sector": "",
    "critical_infrastructure": false
  },
  "actor": {
    "name": "",
    "aliases": [],
    "campaign": "",
    "confidence": "low | medium | high"
  },
  "jurisdiction": {
    "primary_cert": "",
    "law_enforcement_route": "",
    "sector_isac": ""
  }
}

5.4 Classifyline

Worker Job
Classifier Assigns incident type, severity, internal class, and response SLA.
TLPGuard Ensures TLP data cannot be routed to destinations that cannot receive it.
DestinationPolicyGuard Blocks inappropriate, illegal, excessive, or sensitive submissions.

Internal class mapping:

Internal class Meaning External severity
A Imminent harm or attack likely underway Critical
B Credible planned attack High
C Confirmed exposure High / Medium
D Campaign intelligence Medium / High
E Weak signal or watchlist item Low / Monitor

Output:

{
  "class": "A | B | C | D | E",
  "severity": "low | medium | high | critical",
  "tlp": "RED | AMBER | GREEN | CLEAR",
  "incident_type": "ransomware | credential_leak | access_sale | phishing | malware | exploit | botnet | data_leak",
  "policy_blocks": []
}

5.5 Sealine

Sealine replaces the old primary concept of “sanitization.” The objective is not to destroy useful evidence, but to protect it.

Worker Job
EvidencePackager Collects sensitive evidence, hashes it, and packages it with metadata.
Sealer Encrypts evidence for authorized recipients using public-key or hybrid encryption.
KeyBurner Destroys local unwrapped evidence keys after successful sealing.
RetentionGuard Enforces retention, deletion, plaintext destruction, and crypto-erasure policy.

Sealine principle:

Preserve the truth. Seal the sensitive evidence. Route only what each recipient is authorized to receive.

Output:

{
  "sealed_evidence": {
    "package_id": "uuid",
    "encryption": "age | PGP | CMS | hybrid",
    "recipient_keys": [
      {
        "recipient": "CERT-Bund",
        "key_id": "authority-key-id",
        "wrapped_key": "encrypted-evidence-key"
      }
    ],
    "payload_hash": "sha256",
    "plaintext_destroyed": true,
    "local_unwrapped_key_destroyed": true
  }
}

5.6 Routeline

Worker Job
RoutePlanner Chooses destination order based on victim, country, sector, severity, TLP, and evidence type.
PayloadBuilder Builds destination-specific payloads: sealed package, STIX bundle, MISP event, abuse report, or public-safe extract.
Redactor Minimizes public/semi-public outputs only. Redactor does not replace Sealer.
Courier Submits through API, portal, structured email, or secure upload.
RateLimiter Enforces destination quotas, retries, and backoff.
ReceiptCollector Captures case IDs, acknowledgements, API responses, and status URLs.

Example route object:

{
  "routes": [
    {
      "destination": "CERT-Bund",
      "type": "authority",
      "payload": "sealed_evidence_package",
      "priority": 1,
      "max_tlp_allowed": "RED"
    },
    {
      "destination": "MISP trusted community",
      "type": "cti_sharing",
      "payload": "stix_indicators",
      "priority": 2,
      "max_tlp_allowed": "AMBER"
    },
    {
      "destination": "Cloudflare Abuse API",
      "type": "provider_abuse",
      "payload": "minimized_abuse_report",
      "priority": 3,
      "max_tlp_allowed": "CLEAR"
    }
  ]
}

5.7 Ledgerline

Worker Job
Ledger Creates immutable audit records for all external submissions and destructive actions.
Watcher Polls outcomes: takedown status, MISP sightings, CERT acknowledgement, provider response.
Archivist Handles retention, sealed package lifecycle, legal holds, and crypto-erasure confirmation.

Ledger record:

{
  "timestamp": "2026-05-13T00:00:00Z",
  "case_id": "B48-2026-000001",
  "destination": "CERT-Bund",
  "payload_hash": "sha256",
  "submitter_identity": "blue48-official-handle",
  "tlp": "AMBER",
  "response_id": "external-case-id",
  "outcome": "submitted | acknowledged | rejected | actioned"
}

5.8 Publishline

Worker Job
Publisher Produces public-safe intelligence reports after mitigation and approval.

Publisher may include:

  • sector trend
  • actor trend
  • CVEs
  • TTPs
  • defensive recommendations
  • sanitized IOCs
  • non-sensitive timelines

Publisher must not include:

  • raw credentials
  • stolen data
  • victim secrets
  • live access details
  • exact criminal-source links
  • unmitigated exploit paths

6. Which Workers Need Models?

Worker Model need
SourcePlanner None / rules
Crawler / Fetcher None
Parser Rules / small model
Deduper Embeddings / rules
Signalizer Small or medium model
ClaimChecker Small or medium model
ConfidenceScorer Medium model
EntityResolver Rules + embeddings
ActorMapper Small or medium model
Classifier Small or medium model
RoutePlanner Rules first, model second
PayloadBuilder Small model
Publisher Medium or large model
ExampleBuilder Medium model
QualityGate Medium model + rules

Heavy models should be reserved for:

ConfidenceScorer
Classifier
Publisher
ExampleBuilder
QualityGate

7. Human Review Boundaries

Human approval is required before:

  • sending sealed evidence to any external destination
  • contacting law enforcement or CERTs with sensitive evidence
  • publishing a public advisory
  • destroying plaintext evidence
  • destroying local unwrapped evidence keys
  • exporting a training dataset
  • modifying routing policy
  • modifying recipient keys

Two-person control should be required for:

  • sending TLP:RED or highly sensitive packages
  • deleting evidence
  • changing authority recipient keys
  • publishing named-victim reports
  • exporting training data based on internal cases

8. MVP Worker Build Order

Initial worker implementation priority:

  1. SourcePlanner
  2. Fetcher
  3. Parser
  4. Deduper
  5. Signalizer
  6. IOCChecker
  7. EntityResolver
  8. GeoResolver
  9. Classifier
  10. EvidencePackager
  11. Sealer
  12. RoutePlanner
  13. Courier
  14. Ledger
  15. ReceiptCollector
  16. IntelMiner

Minimum operational chain:

Fetcher → Parser → Signalizer → IOCChecker → EntityResolver → Classifier → Sealer → RoutePlanner → Courier → Ledger

9. Technical Notes

Recommended implementation style:

Component Recommendation
Worker runtime Python services, Celery, Temporal, Prefect, or lightweight queue workers
Message format JSON normalized case object
Interop format STIX 2.1 where useful
Storage PostgreSQL + object storage
Search OpenSearch or Meilisearch
CTI graph OpenCTI or MISP integration
Audit append-only ledger table
Secrets .env, secret manager, runtime injection only
UI Blue48 Operations Cockpit

10. Summary

Blue48 should operate as a worker mesh, not a monolithic AI agent.

The system should use small deterministic workers where possible, small models where useful, and larger models only for judgment-heavy steps. Sensitive evidence is handled by Sealine, not casually rendered or distributed. Routing and public reporting are controlled by policy guards, human review, and immutable audit logging.


Blue48 IntelMiner and LoRA Training Data Pipeline

Source: intelminer.md

Document type: Project record / technical concept
Scope: Lawful intelligence collection, training-data preparation, LoRA dataset format, quality gates, safety boundaries
Status: Draft v1


1. Purpose

IntelMiner is the Blue48 worker responsible for collecting lawful defensive cyber-intelligence and converting it into reviewed, license-safe, LoRA-ready training examples.

IntelMiner does not train models to hack. It prepares training data for defensive tasks such as indicator extraction, routing, severity classification, evidence handling, and safe report writing.

Core mission:

IntelMiner collects lawful defensive cyber-intelligence from approved online sources and transforms it into reviewed, license-safe, LoRA-ready JSONL examples for specialized defensive models.


2. What IntelMiner Should Learn From

Allowed source categories:

  • national CERT advisories
  • CISA, ENISA, NCSC, CERT-EU, BSI, ANSSI, and similar public advisories
  • CVE, NVD, and exploited-vulnerability catalogs
  • public vendor threat reports
  • public malware-analysis reports
  • public ransomware trend reports from lawful monitors
  • MISP events where the license and sharing group permit reuse
  • abuse.ch datasets where permitted
  • public IOCs and defensive detection content
  • public incident writeups
  • internally written reports approved for training
  • synthetic examples written by analysts

Restricted or excluded source categories:

  • raw stolen data
  • raw credentials
  • private victim communications
  • criminal-forum content obtained without authorization
  • confidential CTI provider content without training rights
  • TLP:RED material
  • material with unknown or incompatible license
  • content that teaches exploitation, persistence, credential abuse, ransomware operation, or evasion

3. IntelMiner Worker Chain

SourcePlanner
→ Collector
→ LicenseChecker
→ ContentParser
→ Chunker
→ Labeler
→ ExampleBuilder
→ QualityGate
→ ReviewerQueue
→ DatasetWriter

4. Worker Responsibilities

Worker Responsibility
SourcePlanner Defines approved sources, update schedules, license expectations, and collection priority.
Collector Pulls data from APIs, RSS, advisories, STIX/TAXII, MISP, GitHub, PDFs, and public reports.
LicenseChecker Determines whether the material may be used for training. Blocks unknown or restricted content.
ContentParser Extracts text, IOCs, dates, actors, CVEs, TTPs, victim sectors, and source metadata.
Chunker Splits long content into training-sized units while preserving context.
Labeler Assigns task labels such as IOC extraction, routing, classification, report writing, and evidence handling.
ExampleBuilder Converts chunks into instruction/input/output training examples.
QualityGate Removes unsafe, duplicated, mislabeled, low-confidence, or license-problematic examples.
ReviewerQueue Sends candidates to human reviewers. Nothing enters the final dataset without approval.
DatasetWriter Exports approved examples as versioned JSONL datasets.

5. Training Tasks

The LoRA adapters should learn defensive operations only.

Task Purpose
ioc_extraction Extract domains, IPs, URLs, hashes, emails, wallets, CVEs, and file names.
ttp_mapping Map report language to MITRE ATT&CK-style techniques.
severity_classification Classify weak signal, credible threat, confirmed exposure, campaign intelligence, or imminent harm.
routing_decision Decide which reporting destinations are appropriate and in what order.
evidence_handling Decide whether evidence must be sealed, minimized, excluded, or internally retained.
actor_normalization Normalize actor names, aliases, ransomware brands, and campaigns.
source_reliability Estimate source reliability and information credibility.
report_drafting Draft structured victim, CERT, provider, MISP, or public reports.
public_publishing Produce sanitized public intelligence after mitigation.

Do not train examples for:

  • exploitation steps
  • credential abuse
  • phishing construction
  • malware deployment
  • ransomware operations
  • evasion
  • stealth
  • persistence
  • unauthorized forum access
  • instructions for obtaining stolen data

Do not start by training one large mixed LoRA. Start with small task-specific adapters.

Recommended adapter order:

Priority Adapter Reason
1 lora-router Central to the project and easier to evaluate objectively.
2 lora-ioc-extractor High utility, clear labels, measurable precision and recall.
3 lora-evidence-handler Helps enforce safe handling decisions.
4 lora-report-writer Drafts structured notifications after reviewed facts exist.
5 lora-actor-normalizer Improves actor and campaign mapping.
6 lora-public-publisher Produces public-safe summaries after mitigation.

Training should begin only after enough reviewed examples exist:

  • 1,000+ reviewed examples for a single narrow task, or
  • 3,00010,000 mixed examples across several tasks.

Until then, use rules, retrieval, embeddings, and human-reviewed prompts.


7. JSONL Training Format

Each JSONL line should contain one training example.

Standard structure:

{
  "task": "routing_decision",
  "instruction": "Given a defensive cyber-intelligence signal, choose the correct reporting destinations and order.",
  "input": {},
  "output": {},
  "metadata": {
    "source_type": "public_advisory | vendor_report | synthetic | internal_approved",
    "tlp": "CLEAR | GREEN | AMBER",
    "license": "approved",
    "reviewed": true,
    "policy_version": "v1",
    "dataset_version": "dataset-router-v0.1"
  }
}

8. Example: IOC Extraction

{
  "task": "ioc_extraction",
  "instruction": "Extract defensive indicators from the cyber threat report. Return JSON only.",
  "input": "A phishing campaign used login-example[.]com and delivered payload hash 44d88612fea8a8f36de82e1278abb02f. The actor referenced CVE-2024-12345.",
  "output": {
    "domains": ["login-example.com"],
    "hashes": ["44d88612fea8a8f36de82e1278abb02f"],
    "cves": ["CVE-2024-12345"],
    "ips": [],
    "urls": []
  },
  "metadata": {
    "source_type": "synthetic_or_public_report",
    "tlp": "CLEAR",
    "license": "approved",
    "reviewed": true
  }
}

9. Example: Routing Decision

{
  "task": "routing_decision",
  "instruction": "Given a defensive cyber-intelligence signal, choose the correct reporting destinations and order.",
  "input": {
    "incident_type": "access_sale",
    "victim_country": "DE",
    "sector": "energy",
    "critical_infrastructure": true,
    "confidence": "high",
    "tlp": "AMBER"
  },
  "output": {
    "severity": "critical",
    "routes": [
      "CERT-Bund",
      "victim_security_team",
      "sector_isac",
      "law_enforcement_cyber_unit",
      "misp_trusted_community"
    ],
    "evidence_handling": "authority_sealed_package"
  },
  "metadata": {
    "reviewed": true,
    "policy_version": "v1"
  }
}

10. Example: Evidence Handling

{
  "task": "evidence_handling",
  "instruction": "Decide how evidence should be handled before external submission.",
  "input": {
    "evidence_type": "stolen_credentials",
    "destination": "public_abuse_api",
    "contains_pii": true,
    "tlp": "RED"
  },
  "output": {
    "submit_raw": false,
    "handling": "do_not_send_raw_to_public_api",
    "allowed_payload": "metadata_only",
    "sealed_package_required": true,
    "authorized_recipients": ["victim_security_team", "national_cert"]
  },
  "metadata": {
    "reviewed": true
  }
}

11. Dataset Metadata

Every example should include metadata.

Field Purpose
task Training task category.
source_type Origin category of the example.
source_id Internal reference to source document.
license Approved, restricted, unknown, or rejected.
tlp CLEAR, GREEN, AMBER, or RED.
reviewed Human approval status.
reviewer_id Internal reviewer identity or role ID.
policy_version Version of handling policy used.
dataset_version Versioned dataset name.
safety_flags Unsafe content or sensitive material flags.
dedupe_hash Used to prevent duplicate examples.

12. QualityGate Rules

QualityGate must reject examples that contain:

  • raw credentials
  • raw stolen data
  • private victim information
  • live access details
  • exploit chains
  • malware deployment steps
  • phishing instructions
  • evasion or persistence guidance
  • incompatible license
  • unknown provenance
  • duplicated content
  • unreviewed TLP:RED or confidential content

QualityGate should flag for human review when:

  • source license is ambiguous
  • actor attribution is uncertain
  • victim identity is named
  • sample contains personal data
  • output teaches operationally sensitive details
  • example conflicts with policy

13. Dataset Builder UI Requirements

IntelMiner should be visible in the Blue48 Operations Cockpit.

Screens:

Screen Purpose
Dataset Sources Manage approved sources, license status, and collection schedules.
Training Candidate Queue Review generated examples before approval.
Example Review Edit, approve, reject, or mark examples unsafe.
Dataset Builder Export versioned JSONL datasets with train/validation split.
Dataset Audit Track source, reviewer, license, and policy version.

Candidate fields:

Field Meaning
Task IOC extraction, routing, classification, etc.
Source advisory, blog, report, synthetic, internal.
License approved, restricted, unknown, rejected.
Quality score Estimated usefulness.
Safety flag safe, needs review, reject.
Reviewer status pending, approved, rejected.

14. Dataset Versioning

Datasets should be versioned clearly:

dataset-router-v0.1
dataset-ioc-extractor-v0.3
dataset-evidence-handler-v0.2
dataset-report-writer-v0.2

Each export should include:

  • dataset name
  • version
  • date
  • number of examples
  • task distribution
  • source distribution
  • license distribution
  • reviewer count
  • rejected example count
  • train/validation split
  • policy version

15. Human Review Requirements

Human approval is required before examples become training data.

Reviewers should check:

  • factual correctness
  • source license
  • safety boundaries
  • absence of raw sensitive data
  • correct label
  • useful expected output
  • no attacker-enabling content

Two-person review is recommended for:

  • internal case-derived examples
  • sensitive incident examples
  • actor attribution examples
  • routing examples involving law enforcement or critical infrastructure
  • examples derived from TLP:AMBER material

TLP:RED material should not be used for LoRA training unless an explicit legal, operational, and governance policy exists.


16. Summary

IntelMiner is the bridge between Blue48 operations and future specialized defensive models.

It should collect only lawful and approved data, check license and safety constraints, build structured examples, require human review, and export versioned JSONL datasets. The first LoRA should likely be lora-router, followed by lora-ioc-extractor and lora-evidence-handler.


Blue48 Operations Cockpit — GUI / UI-UX Concept

Source: blue48_operations_cockpit_ui_ux.md

Document type: Project record / technical concept
Scope: GUI, operator workflow, worker observability, evidence handling, routing review, and IntelMiner dataset operations
Status: Draft v1


1. Purpose

The Blue48 Operations Cockpit is the human-facing command center for the worker mesh.

The GUI must let operators see, review, approve, seal, route, audit, and publish cyber-intelligence cases without losing control of sensitive evidence or outbound submissions.

The core principle is:

The system may automate collection, enrichment, packaging, and routing, but humans must clearly see the chain of reasoning, evidence status, risk level, and outbound submissions before anything sensitive leaves the platform.

The GUI should not be a decorative dashboard first. It should be an operational cockpit.


2. Core Control Surfaces

The product should be designed around six main control surfaces:

Control Surface Primary Question Answered
Cases What is happening?
Evidence What is protected?
Routing Where will it go?
Workers What produced this result?
Ledger What can we prove happened?
Trainline What can become safe training data?

These six areas should drive navigation, permissions, and MVP scope.


3. Main Navigation

Recommended sidebar navigation:

OPERATIONS
- Mission Control
- Case Queue
- Worker Mesh
- Routing Review
- Receipts

EVIDENCE
- Evidence Vault
- Sealed Packages
- Retention

INTELLIGENCE
- Reports
- MISP / STIX Events
- Public Advisories

TRAINING
- IntelMiner
- Training Candidates
- Dataset Builder

SYSTEM
- Integrations
- Policy Engine
- Ledger
- Admin

Minimal route structure:

/dashboard
/cases
/cases/:id
/cases/:id/evidence
/cases/:id/routing
/receipts
/workers
/trainline/candidates

4. Mission Control

Mission Control is the landing dashboard.

Its purpose is to show what is happening right now.

Key Widgets

Widget Shows
Active Signals New unreviewed leads from Scoutline
Critical Queue Imminent harm / critical infrastructure cases
Pending Human Review Cases waiting for analyst approval
Sealed Evidence Packages Evidence encrypted and ready for authority handoff
Outbound Reports Reports waiting to be sent
Receipts / Acknowledgements CERT, MISP, abuse API, and provider responses
Worker Health Workers running, degraded, failed, paused, or stopped
Rate Limits API quota usage per destination
Legal / TLP Warnings Items blocked by policy guard

Suggested Layout

┌──────────────────────────────────────────────────────────────┐
│ Blue48 Operations Cockpit                                    │
├──────────────┬──────────────┬──────────────┬────────────────┤
│ Critical     │ Pending      │ Sealed       │ Submitted      │
│ Cases        │ Review       │ Packages     │ Reports        │
├──────────────┴──────────────┴──────────────┴────────────────┤
│ Live Worker Mesh Timeline                                    │
├──────────────────────────────┬───────────────────────────────┤
│ Priority Case Queue          │ Destination / API Health       │
├──────────────────────────────┴───────────────────────────────┤
│ Recent Receipts and Outcomes                                  │
└──────────────────────────────────────────────────────────────┘

5. Case Queue

The Case Queue is the main daily-use screen.

Each row represents one signal or incident candidate.

Column Meaning
Case ID Unique internal case identifier
Class A/B/C/D/E or Critical/High/Medium/Low
TLP RED / AMBER / GREEN / CLEAR
Confidence Low / Medium / High or Admiralty Code
Victim Organization, domain, or unknown
Country Used for CERT routing
Sector Healthcare, finance, energy, government, etc.
Incident Type Access sale, ransomware, phishing, credential leak, botnet, exploit, data leak
Actor Known group / suspected actor / unknown
Current Worker Worker currently responsible for the case
Next Action Review, seal, route, submit, wait, archive
Deadline SLA based on severity
Owner Assigned analyst

Example row:

[CRITICAL] [TLP:AMBER] DE energy provider | access sale | high confidence | Sealer ready | Review required

Filters

The queue should support filters for:

  • severity
  • class
  • TLP
  • country
  • sector
  • actor
  • source type
  • confidence
  • pending approval
  • failed submission
  • critical infrastructure only
  • worker state

6. Case Detail View

The Case Detail View is where analysts work on a single case.

Recommended tabs:

Overview | Evidence | Timeline | Worker Output | Routing | Reports | Receipts | Audit

Overview Tab

The Overview tab should show:

  • case summary
  • severity
  • class
  • confidence
  • affected entity
  • actor
  • jurisdiction
  • recommended route
  • current state
  • required approval

Example:

Case: B48-2026-000184
Type: Initial Access Sale
Severity: Critical
TLP: AMBER
Confidence: High
Victim Country: Germany
Sector: Energy
Recommended Route:
1. CERT-Bund
2. Victim Security Team
3. Sector ISAC
4. MISP Trusted Community

7. Evidence View

The Evidence View is where the Sealer concept appears in the GUI.

Raw sensitive evidence should not casually render by default.

The UI should show evidence status instead of exposing raw contents.

Evidence Status Labels

Status Meaning
Unsealed Evidence exists internally but has not been authority-sealed
Sealed Evidence has been encrypted for selected authorized recipients
Plaintext Destroyed Local plaintext copy has been removed
Local Key Destroyed Local unwrapped evidence key has been removed
Recipient Decryptable Selected authority or victim can decrypt the package
Public-Safe Extract Available Redacted/minimized metadata is available for public or semi-public destinations

Evidence Display Model

Evidence Package
├── Metadata preview: visible
├── Sensitive content: locked by default
├── Hashes: visible
├── Recipient keys: visible
├── Local decryption access: unavailable after key destruction
└── Chain of custody: visible

Evidence Actions

Recommended actions:

  • Seal Evidence
  • Add Recipient Key
  • Verify Package Hash
  • Destroy Local Plaintext
  • Destroy Local Unwrapped Key
  • Generate Public-Safe Extract
  • Request Human Approval

The UI should make the trust state obvious:

Raw evidence: locked
Sealed package: ready
Local plaintext: destroyed
Local key: destroyed
Recipient: CERT-Bund can decrypt
Public extract: available

8. Worker Mesh View

The Worker Mesh View is the observability screen for the processing pipeline.

It should show the worker topology and the health of each worker.

Worker Lines

Scoutline
 SourcePlanner → Crawler → Fetcher → Parser → Deduper → SourceRanker

Proofline
 Signalizer → Correlator → IOCChecker → ClaimChecker → ConfidenceScorer

Mapline
 EntityResolver → GeoResolver → SectorMapper → ActorMapper → CVEResolver

Sealine
 EvidencePackager → Sealer → KeyBurner → RetentionGuard

Routeline
 RoutePlanner → PayloadBuilder → Courier → ReceiptCollector

Trainline
 IntelMiner → LicenseChecker → Chunker → Labeler → QualityGate → DatasetWriter

Worker Tile Fields

Each worker tile should show:

Field Meaning
Status Healthy, degraded, failed, paused, or stopped
Queue Depth Number of waiting jobs
Last Run Most recent execution timestamp
Error Count Recent failures
Average Processing Time Performance indicator
Model / API Used Which model, API, or rule engine was used
Cost Estimate Optional model/API cost estimate
Last Output Sample Small safe preview of output
Controls Retry, pause, resume, open logs

The goal is to prevent the system from becoming a black box.


9. Routing Review Screen

The Routing Review Screen is where humans approve outbound reports.

It should show recommended destinations, payload types, policy decisions, and blocks.

Example:

Recommended destinations:
✓ CERT-Bund — sealed evidence package
✓ Victim Security Team — sealed evidence package
✓ MISP Trusted Community — TLP:AMBER STIX indicators
✓ Cloudflare Abuse — minimized abuse report
✕ VirusTotal — blocked: contains sensitive sample / TLP too high

Destination Fields

Field Purpose
Destination CERT, MISP, provider, abuse API, registrar, law enforcement, victim
Payload Type Sealed package, STIX bundle, minimized abuse report, advisory draft
Max TLP Allowed Prevents over-sharing
Required Auth API key, PGP, portal, structured email, OIDC
Rate-Limit Budget Whether submission can happen now
Policy Status Allowed, blocked, or needs approval
Legal Status Safe, review required, or blocked
Expected Receipt Case ID, acknowledgement, or status URL

Actions

Recommended actions:

  • Approve Selected Routes
  • Block Route
  • Require Legal Review
  • Send to Sealer
  • Send to Redactor
  • Submit Now

The interface should never provide one broad dangerous action such as Send Everything.


10. Report Builder

The Report Builder creates destination-specific outputs.

Report Templates

Template Used For
Victim Notification Direct affected organization
CERT Notification National CERT / CSIRT
Law Enforcement Referral Criminal activity
Provider Abuse Report Hosting, CDN, registrar, cloud, email provider
MISP Event CTI sharing
Public Advisory Sanitized public report
Training Example LoRA dataset candidate
Left pane: structured case data
Right pane: generated report preview

Warning Flags

The builder should warn when a draft:

  • contains PII
  • contains raw credentials
  • contains TLP:RED material
  • contains victim name
  • contains exploit detail
  • contains unsealed evidence
  • exceeds destination TLP allowance
  • targets a public or semi-public platform with sensitive content

11. IntelMiner / Trainline UI

The IntelMiner and Trainline UI should be separate from active operations.

This prevents analysts from confusing live cases with training candidates.

Dataset Sources Screen

Shows:

Field Meaning
Source Name Human-readable source name
URL / API Collection endpoint
License Status Approved, restricted, unknown, rejected
Allowed for Training Yes / no / review required
Last Collected Most recent collection timestamp
Document Count Number of collected documents
Failure Rate Recent collection reliability

Training Candidate Queue

Each candidate should show:

Field Meaning
Task IOC extraction, routing, classification, report writing, actor normalization
Source Advisory, blog, report, synthetic, internal
License Approved, restricted, unknown
Quality Score Estimated usefulness
Safety Flag Safe, needs review, reject
Reviewer Status Pending, approved, rejected, edited

Example Review Screen

The reviewer should see:

Instruction
Input
Expected Output
Metadata
Source License
Safety Flags

Actions:

  • Approve
  • Reject
  • Edit
  • Send Back to Labeler
  • Mark as Unsafe
  • Export to JSONL

Dataset Builder

The Dataset Builder should show:

  • examples by task
  • token counts
  • train/validation split
  • duplicates
  • class imbalance
  • rejected examples
  • export version

Example dataset versions:

dataset-router-v0.1
dataset-ioc-extractor-v0.3
dataset-report-writer-v0.2

12. Roles and Permissions

The GUI requires strict role-based access control.

Role Can Do
Viewer Read dashboards and public-safe summaries
Analyst Review signals, enrich cases, draft reports
Sealer Officer Seal evidence and manage recipient keys
Router Officer Approve destinations and routing decisions
Legal Reviewer Approve sensitive or cross-border submissions
Admin Manage users, integrations, policies, and configuration
Dataset Curator Approve training examples and exports
Auditor Read ledger and export compliance logs

Two-Person Control

Critical actions should require two-person approval:

  • send sealed evidence
  • submit to law enforcement
  • publish a public advisory
  • destroy plaintext
  • destroy local unwrapped evidence keys
  • export a training dataset
  • modify routing policy
  • modify recipient keys

13. Case State Machine

Every case should follow a clear state machine.

Normal States

NEW_SIGNAL
→ PARSED
→ VERIFIED
→ MAPPED
→ CLASSIFIED
→ REVIEW_REQUIRED
→ EVIDENCE_PACKAGED
→ SEALED
→ ROUTE_PROPOSED
→ APPROVED_FOR_SUBMISSION
→ SUBMITTED
→ ACKNOWLEDGED
→ ACTIONED
→ ARCHIVED

Error / Block States

BLOCKED_BY_TLP
BLOCKED_BY_POLICY
NEEDS_LEGAL_REVIEW
DESTINATION_RATE_LIMITED
SUBMISSION_FAILED
INSUFFICIENT_CONFIDENCE
DUPLICATE_CASE

The state machine should be visible in the Case Detail View.


14. UI / UX Principles

Make Risk Visible

Every screen should answer:

  • What is the severity?
  • What is the confidence?
  • Who is affected?
  • What data is sensitive?
  • Who can decrypt it?
  • What will be sent?
  • Where will it be sent?
  • What policy allows or blocks this?

Make Automation Interruptible

Analysts must be able to:

  • pause a worker
  • block a route
  • downgrade confidence
  • require legal review
  • mark as duplicate
  • prevent publication
  • reopen a case

Make Evidence Status Obvious

Use labels such as:

Raw evidence: locked
Sealed package: ready
Local plaintext: destroyed
Local key: destroyed
Recipient: CERT-Bund can decrypt
Public extract: available

Avoid Dangerous UX Patterns

Avoid:

  • one-click “send all” actions
  • hidden payloads
  • unclear TLP labels
  • buried warnings
  • irreversible actions without confirmation
  • publishing controls mixed with private reporting controls
  • exposing raw evidence by default

15. Minimal MVP GUI

Do not build everything first.

The first useful MVP should include:

  1. Mission Control
  2. Case Queue
  3. Case Detail
  4. Evidence Sealing View
  5. Routing Review
  6. Courier Receipts
  7. Worker Health
  8. IntelMiner Dataset Queue

MVP Routes

/dashboard
/cases
/cases/:id
/cases/:id/evidence
/cases/:id/routing
/receipts
/workers
/trainline/candidates

This MVP is enough to operate safely while keeping the scope manageable.


Layer Recommendation
Frontend React / Next.js
UI Components shadcn/ui + Tailwind
Charts Recharts
Workflow Graph React Flow
Tables TanStack Table
Backend API FastAPI
Worker Orchestration Celery, Temporal, or Prefect
Database PostgreSQL
Search OpenSearch or Meilisearch
Graph Intelligence OpenCTI / Neo4j optional
Object Storage S3-compatible encrypted storage
Audit Log Append-only PostgreSQL table or immutability layer
Auth OIDC / Keycloak
Realtime Updates WebSockets or Server-Sent Events

React Flow is especially useful for the Worker Mesh screen.


17. Visual Identity

The design should feel:

  • calm
  • operational
  • serious
  • high-trust
  • defensive
  • readable under pressure

Avoid:

  • cyberpunk styling
  • hacker neon
  • gamification
  • aggressive animation
  • cluttered dashboards

Recommended style:

Dark mode by default
High-contrast severity labels
Muted blue/gray base
Red only for critical
Amber for warnings
Green for completed/safe
Clear TLP badges
Large readable tables
Minimal animations

Recommended UI language:

Evidence protected.
Route blocked by policy.
Human approval required.
Recipient can decrypt.
Local key destroyed.
Submission acknowledged.

18. Final Operating Model

The GUI should support this operational chain:

Detect
→ Validate
→ Classify
→ Seal Evidence
→ Review Routes
→ Submit Reports
→ Track Receipts
→ Archive Safely
→ Publish Sanitized Intelligence
→ Build Reviewed Training Data

The core cockpit should keep humans in control of five things:

1. Cases — what is happening?
2. Evidence — what is protected?
3. Routing — where will it go?
4. Workers — what produced this result?
5. Ledger — what can we prove happened?

The training workspace adds the sixth:

6. Trainline — what can become safe training data?

19. Summary

The Blue48 GUI should be an operations cockpit, not a passive dashboard.

It must provide:

  • live case visibility
  • worker observability
  • authority-sealed evidence control
  • human routing approval
  • TLP and policy enforcement
  • receipt and outcome tracking
  • immutable audit visibility
  • safe IntelMiner training-data review

The first MVP should focus on daily operational safety and decision control before advanced analytics or public-reporting features are added.


API-Eligible Cyber Threat Reporting & Escalation Platforms

Source: waypoints.md

Project purpose: Build a white-hat defensive reporting workflow that can push credible pre-incident or incident intelligence to the right receivers through APIs or structured machine-to-machine channels.

Scope: This document focuses on platforms that support API-based reporting, submission, alert ingestion, or structured intelligence sharing. It excludes direct interaction with criminal forums and excludes sources that only provide manual web forms unless they are still operationally important as a fallback.

Last reviewed: 2026-05-13


1.1 Normal credible threat against a named organization

  1. Victim security contact / VDP / security.txt
  2. National CERT / CSIRT
  3. Sector ISAC / ISAO
  4. Law enforcement cyber unit
  5. Infrastructure provider abuse API
  6. Threat-intelligence sharing platform
  7. Sanitized public advisory

1.2 Imminent harm or critical infrastructure

  1. National CERT / CSIRT
  2. Victim security team
  3. Law enforcement cyber unit
  4. Sector regulator / ISAC
  5. Infrastructure provider abuse API
  6. Trusted CTI community
  7. Public advisory only after mitigation or authority clearance

1.3 Malicious infrastructure, phishing, malware, or botnet indicators

  1. Platform-specific reporting API
    Examples: Cloudflare Abuse Reports API, Spamhaus Submission API, AbuseIPDB, URLhaus, MalwareBazaar, PhishTank, urlscan.io, Google Web Risk, VirusTotal.
  2. CERT / CSIRT
  3. Affected victim
  4. MISP / OpenCTI / trusted CTI sharing
  5. Public sanitized report

2. Tier-1 API Reporting Platforms

These are the strongest fits for automated defensive reporting because they accept machine-readable submissions or support structured threat sharing.

Priority Platform Best for API / submission capability Use in workflow
1 CISA Automated Indicator Sharing (AIS) Sharing cyber threat indicators with U.S. government and AIS participants STIX/TAXII bidirectional indicator sharing High-confidence indicators, especially campaigns, exploited infrastructure, malware IOCs
2 MISP Community and private threat-intelligence sharing REST API, PyMISP, event and attribute creation Share vetted IOCs, TTPs, victim-agnostic campaign intelligence
3 OpenCTI Internal or consortium CTI knowledge base GraphQL API and connectors Normalize, enrich, and route intelligence before external disclosure
4 Cloudflare Abuse Reports API Abuse hosted behind or involving Cloudflare API supports submitting abuse reports, viewing report details, and listing reports Phishing, malware, abusive hosting, malicious domains using Cloudflare services
5 Spamhaus Submission Portal API Malicious IPs, domains, URLs, suspicious email content REST API for suspicious IP/domain/URL/email reports Reputation/blocklist contribution and takedown-support evidence
6 AbuseIPDB Malicious IP reputation API for reporting and checking abusive IP addresses Scanner, brute-force, spam, probing, attack-source IP reporting
7 URLhaus / abuse.ch Malware distribution URLs Community API for downloading and submitting malware URLs Active malware URL reporting and malware-distribution tracking
8 MalwareBazaar / abuse.ch Malware sample exchange Community API for sample upload/download and bulk queries Malware sample submission and hash enrichment
9 PhishTank Phishing URL verification API for phishing URL status checks; community submission workflow Phishing verification and enrichment
10 urlscan.io URL detonation, phishing/malware page evidence Submission API to scan URLs and retrieve results Safe screenshot/evidence generation, IOC enrichment
11 Google Web Risk Submission API Unsafe URL submission to Google Safe Browsing ecosystem Submission API for suspected unsafe URLs; access requires sales/customer-engineer approval High-scale malicious URL reporting
12 VirusTotal API File, URL, domain, IP enrichment and submission API for file upload, URL scan, reports, and comments Enrichment and submission to multi-vendor analysis ecosystem
13 Netcraft Report API Phishing, malware, suspicious URLs, emails, files API for automated threat reporting Brand abuse, phishing, takedown-support reporting

3. Platform Notes

3.1 CISA Automated Indicator Sharing (AIS)

Type: Government-backed indicator sharing
Best for: High-confidence cyber threat indicators and defensive measures
API style: STIX/TAXII
Good submissions: IPs, domains, URLs, hashes, malware indicators, campaign indicators
Avoid: Victim-identifying details unless necessary and authorized

Operational fit: Use for campaign-level and infrastructure-level reporting, especially when the intelligence may protect multiple organizations.

Source: https://www.cisa.gov/how-automated-indicator-sharing-ais-works


3.2 MISP

Type: Open-source threat-intelligence sharing platform
Best for: Structured CTI sharing inside trusted communities
API style: REST API; PyMISP client
Good submissions: Events, attributes, galaxies, taxonomies, TLP-tagged indicators, sightings
Avoid: Raw stolen data, credentials, or victim-sensitive artifacts without permission

Operational fit: Use as the main trusted-community sharing layer.

Sources:


3.3 OpenCTI

Type: Threat-intelligence platform / knowledge graph
Best for: Internal CTI normalization, enrichment, and case-to-intel routing
API style: GraphQL API
Good submissions: STIX-like entities, observables, reports, relationships, indicators, malware, threat actors
Avoid: Treating OpenCTI itself as the final external reporting destination unless connected to a sharing community

Operational fit: Use as your central intelligence brain before pushing to MISP, CERTs, providers, or reports.

Sources:


3.4 Cloudflare Abuse Reports API

Type: Infrastructure provider abuse reporting
Best for: Phishing, malware, abuse involving Cloudflare-protected assets
API style: REST API
Good submissions: URLs, domains, abuse category, evidence, contact details
Avoid: Large stolen datasets; provide proof and context instead

Operational fit: Use whenever malicious infrastructure resolves through or is protected by Cloudflare.

Sources:


3.5 Spamhaus Submission Portal API

Type: Reputation and abuse-intelligence reporting
Best for: Malicious IPs, domains, URLs, suspicious email content
API style: REST API
Good submissions: IPs, domains, URLs, suspicious raw email/source evidence
Avoid: Unverified mass submissions; maintain high-confidence standards

Operational fit: Use for reliable contribution to reputation systems and anti-abuse communities.

Sources:


3.6 AbuseIPDB

Type: IP reputation and abuse reporting
Best for: Attack-source IP reporting
API style: REST API
Good submissions: Brute force, scanning, spam, exploitation attempts, abusive traffic categories
Avoid: Reporting shared NAT/VPN/cloud IPs without strong evidence

Operational fit: Use as an automated destination for source-IP abuse reports, especially from honeypots, firewalls, and SIEM detections.

Sources:


3.7 URLhaus / abuse.ch

Type: Malware URL exchange
Best for: Active malware distribution URLs
API style: Community API with Auth-Key
Good submissions: URLs directly serving malware payloads
Avoid: Generic phishing pages that do not distribute malware

Operational fit: Use when you can verify a URL is actively distributing malware.

Source: https://urlhaus.abuse.ch/api/


3.8 MalwareBazaar / abuse.ch

Type: Malware sample exchange
Best for: Malware samples, hashes, family tracking
API style: Community API with Auth-Key
Good submissions: Malware samples and related metadata
Avoid: Benign files, sensitive internal documents, or samples that cannot be legally shared

Operational fit: Use after malware handling review, with strict legal and operational controls.

Source: https://bazaar.abuse.ch/api/


3.9 PhishTank

Type: Community phishing clearing house
Best for: Phishing URL verification and community validation
API style: HTTP POST lookup API
Good submissions: Suspected phishing URLs
Avoid: URLs containing victim credentials, tokens, or private data in query strings

Operational fit: Use for phishing intelligence enrichment and community verification.

Sources:


3.10 urlscan.io

Type: URL scanning and investigation platform
Best for: URL detonation, phishing evidence, page screenshots, redirects, IP/domain enrichment
API style: Submission API and search API
Good submissions: Suspicious URLs, phishing pages, malicious landing pages
Avoid: Private internal URLs or sensitive tokens; set scan visibility carefully

Operational fit: Use before provider reporting to create structured, shareable evidence.

Sources:


3.11 Google Web Risk Submission API

Type: Unsafe URL submission into Googles protection ecosystem
Best for: High-scale phishing/malware URL submissions
API style: Submission API; restricted access
Good submissions: Suspected unsafe URLs that should be evaluated for Safe Browsing protection
Avoid: Assuming access is automatic; Google says access requires contacting sales or a customer engineer

Operational fit: Use when your group has enough volume and quality control to justify access.

Source: https://docs.cloud.google.com/web-risk/docs/submission-api


3.12 VirusTotal API

Type: Multi-vendor malware and URL analysis ecosystem
Best for: URL/file submission, enrichment, analysis reports, community comments
API style: REST API
Good submissions: Suspicious files, URLs, domains, IPs, hashes
Avoid: Uploading confidential files, customer data, private source code, or stolen materials

Operational fit: Use for enrichment and multi-vendor visibility. Use private scanning options if available and appropriate.

Sources:


3.13 Netcraft Report API

Type: Phishing, malware, suspicious URL/email/file reporting
Best for: Phishing and takedown-support reporting
API style: Report API
Good submissions: Malicious URLs, suspicious emails, files, phishing evidence
Avoid: Low-confidence or privacy-sensitive submissions

Operational fit: Use for high-confidence phishing and brand-abuse reporting, especially where takedown support matters.

Sources:


4. Internal Case / Incident Routing Platforms

These platforms are not external public reporting destinations, but they are useful for receiving your detections through APIs and routing them into a proper case workflow.

Platform Best for API capability Workflow role
TheHive SOC alert-to-case management TheHive 5 API supports alert creation Convert signals into triaged investigations
DFIR-IRIS Collaborative incident response Alerts API and general API key support Internal IR case management
ServiceNow SIR Enterprise security incident response REST API to write to Security Incident Import table Enterprise escalation and tracking
Jira Service Management Incidents Incident workflow automation Incidents REST API Lightweight or engineering-driven incident coordination

Sources:


5. Platforms Useful for Monitoring but Not Primary API Reporting Destinations

Platform API status Recommendation
Ransomware.live API available for ransomware victim/group intelligence Use for monitoring and enrichment, not as the main reporting destination
Shadowserver RESTful API for report data access; no STIX/TAXII currently Use for inbound network exposure/threat reports and enrichment
Have I Been Pwned API for breach account lookups Use for exposure checks, not submitting new breach reports unless separately arranged
OpenPhish No public lookup API; offers feed/module model Use feed or email/manual reporting fallback
Microsoft Defender submissions Portal and Microsoft Graph threat-submission resources for some Defender scenarios Use when operating within a Microsoft tenant or Defender workflow

Sources:


6. Practical Routing Matrix

Evidence type First API destination Second destination Internal system
Malicious IP scanning/brute force AbuseIPDB Spamhaus if relevant TheHive / DFIR-IRIS
Malware distribution URL URLhaus Google Web Risk / VirusTotal / urlscan.io MISP / OpenCTI
Malware sample MalwareBazaar VirusTotal TheHive / OpenCTI
Phishing URL PhishTank / Netcraft / urlscan.io Google Web Risk / Cloudflare if hosted/proxied there MISP / TheHive
Cloudflare-proxied abuse Cloudflare Abuse Reports API Netcraft / PhishTank if phishing Internal case platform
Suspicious email infrastructure Spamhaus Submission API AbuseIPDB for IPs MISP / OpenCTI
Campaign-level indicators CISA AIS / MISP CERT/CSIRT OpenCTI
Ransomware victim claim Victim + CERT first MISP only sanitized indicators OpenCTI / TheHive
Leaked credentials/API keys Victim first CERT if severe Internal IR case only
Critical infrastructure threat CERT/CSIRT first Victim + law enforcement Internal restricted case

7. Minimum Viable API Stack

For a new white-hat group, start with:

  1. MISP — trusted sharing and structured IOC exchange.
  2. OpenCTI — central intelligence normalization and knowledge graph.
  3. TheHive or DFIR-IRIS — case management and triage.
  4. AbuseIPDB — automated IP abuse reporting.
  5. URLhaus — malware URL submission.
  6. MalwareBazaar — malware sample submission, only with legal controls.
  7. urlscan.io — URL evidence generation.
  8. Cloudflare Abuse Reports API — infrastructure abuse reports.
  9. Spamhaus Submission Portal API — IP/domain/URL/email reputation reporting.
  10. CISA AIS or national CERT sharing route — campaign-level indicator sharing.

8. Data Handling Rules

Never submit publicly

  • Raw credentials
  • API keys or session cookies
  • Stolen databases
  • Internal screenshots that identify victims without consent
  • Exploit instructions
  • Live access details
  • Private source code
  • Sensitive personal data

Safe to submit when verified

  • IP addresses
  • Domains
  • URLs, if they do not contain tokens or PII
  • File hashes
  • Malware samples, only where legally allowed
  • Timestamps
  • Actor handles
  • Campaign labels
  • CVEs
  • MITRE ATT&CK techniques
  • Sanitized screenshots
  • Provider-neutral technical context

Use this normalized object internally before transforming to each API schema.

{
  "case_id": "WG-2026-000001",
  "tlp": "AMBER",
  "severity": "A|B|C|D|E",
  "confidence": "low|medium|high",
  "threat_type": "phishing|malware|ransomware|credential_exposure|iab|botnet|vulnerability_exploitation",
  "victim": {
    "organization": "",
    "domain": "",
    "country": "",
    "sector": ""
  },
  "source": {
    "category": "forum|leak_site|telegram|honeypot|sensor|osint|tip",
    "first_seen": "",
    "last_seen": "",
    "collection_method": "lawful_osint_or_partner_feed"
  },
  "observables": {
    "ips": [],
    "domains": [],
    "urls": [],
    "hashes": [],
    "emails": [],
    "wallets": [],
    "cves": []
  },
  "evidence": {
    "summary": "",
    "sanitized_screenshots": [],
    "raw_evidence_location": "internal_restricted_storage"
  },
  "recommended_actions": [],
  "routing": {
    "primary_destinations": [],
    "secondary_destinations": [],
    "public_disclosure_allowed": false
  }
}

10. Final Recommendation

The most practical API-driven architecture is:

Sensors / CTI sources → OpenCTI → TheHive or DFIR-IRIS → routing engine → MISP + provider abuse APIs + CERT/AIS channels → sanitized public reporting

This keeps the group legally safer, avoids amplifying criminal material, and creates a repeatable path from early warning to real defensive action.


Review — API-Eligible Cyber Threat Reporting & Escalation Platforms (Draft v1)

Source: waypoints_firstpass.md

Reviewer: Claude (Opus 4.7, 1M context) Review date: 2026-05-13 Document reviewed: waypoints.md (first draft) Verdict: Strong bones. Tone-perfect for white-hat defensive work — machine-to-machine, no vigilante framing. Publishable as an internal whitepaper after the critical fixes below.


1. What's Already Solid

Don't change these — they're load-bearing and correct.

  • Section 1.1 vs 1.2 split (normal vs imminent harm) — exactly the right hinge for routing decisions.
  • Section 8 (never-submit list) — covers GDPR / exploitation amplification / credential leakage failure modes well.
  • Section 9 normalized object — the right abstraction. Transform-to-target instead of N bespoke pipelines.
  • Section 10 architecture sentence — the whole project on one line: Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public.

2. Critical Fixes (do these before this leaves draft)

2.1 Geography mismatch — CISA AIS at #1 is US-only

For European-focused work, MISP via CIRCL.lu (Luxembourg) or the ENISA CSIRTs Network is the workhorse. CISA AIS does not cover EU institutions.

Action: Swap priorities #1 ↔ #2 (MISP first, AIS second). Add a row for CERT-EU specifically for European institutions.

2.2 National CERTs are referenced generically but never named

The doc says "National CERT/CSIRT" everywhere but never resolves it to an actionable receiver.

Action: Add a small table after Section 1:

Country Receiver Channel
DE BSI / CERT-Bund reports@cert-bund.de, MISP community
FR ANSSI / CERT-FR TAXII feed
UK NCSC-UK structured email + early-warning service
NL NCSC-NL MISP
ES CCN-CERT, INCIBE-CERT MISP
EU CERT-EU, Europol EC3 TLP-tagged MISP

The routing engine should pick the right one based on victim country.

Note on Europol EC3: they handle criminal cases, not first-call technical sharing. Route through your national CERT first; EC3 receives via national channels for cross-border coordination.

2.3 Domain registrar abuse is missing from Section 1.3

Cloudflare is covered, but registrars (Namecheap, Tucows, GoDaddy, EURid for .eu, DENIC for .de) are often the faster takedown path.

Action: Add to the malicious-infrastructure flow: registrar abuse contact from WHOIS → registrar abuse API/email → registry as escalation.

2.4 Severity scale A|B|C|D|E is unusual and undefined

Either define it inline or replace with the standard low|medium|high|critical (CVSS-style) or NIS2 severity categories for EU consistency. Receivers will normalize anyway — but defining it lets the routing engine make automatic decisions.

2.5 Normalized object missing an actor block

You have victim but no actor. Add:

"actor": {
  "name": "Adira",
  "aliases": [],
  "campaign": "",
  "confidence": "A1|A2|B1|B2|C2|C3|D|E|F"
}

This field connects the doc to the project mission and lets the routing matrix differentiate actor-specific sightings from generic abuse reports.

(A1F is the Admiralty Code, the de-facto CTI standard. If that's too much, fall back to low|medium|high.)

2.6 PII at submission time is a GDPR landmine

Section 9 has observables.emails: []. Submitting victim email addresses to AbuseIPDB or VirusTotal is a personal-data transfer under GDPR.

Action: Add a pre-submission sanitizer step that:

  • Hashes / redacts emails to local-part-hash@domain when destination is public
  • Strips PII from URLs (tokens, query params containing identifiers)
  • Keeps raw originals only in evidence.raw_evidence_location (internal-only storage)

This belongs in the doc before the normalized-object section, not as an afterthought.


3. High-Value Additions

3.1 TLP enforcement at the routing layer

Nothing in the current schema prevents TLP:RED data being routed to a TLP:CLEAR destination.

Action: Add a routing precondition: submission.tlp <= destination.max_tlp_allowed.

  • CISA AIS rejects TLP:RED
  • Cloudflare doesn't care
  • Spamhaus has its own rules
  • MISP communities each have their own ceiling

Encode the ceiling per destination in the routing matrix.

3.2 STIX 2.1 as the serialization

Right now the doc implies internal object → bespoke transform per API. Cheaper and more standard:

internal object → STIX 2.1 bundle → minor adapter per destination

MISP, OpenCTI, CISA AIS, and most CTI tools are STIX-native. One serializer beats thirteen, and you get free interop with anything that already speaks STIX.

3.3 Rate-limit budgets

Many of these APIs have strict limits:

  • AbuseIPDB free tier: 1000 reports/day
  • VirusTotal public API: 4 req/min
  • Spamhaus: per-submitter quotas
  • Cloudflare: per-account rate limits

Without a token-bucket per destination, high-confidence submissions get silently dropped during bursts.

Action: Add a destination_quota field to the routing matrix and an enforcement layer.

3.4 Feedback loop is missing

When you submit to URLhaus, you can poll for status. When you submit to MISP, you get sightings. When you submit to Cloudflare, you get a case number. These should flow back into your OpenCTI graph as evidence-of-effectiveness.

Without this, you're operating open-loop — you don't know which destinations actually act on your reports.

Action: Add a Section 11 "Receipt and Effectiveness Tracking" that defines:

  • Per-destination receipt schema (case ID, ack timestamp, outcome status)
  • Polling cadence per destination
  • A success metric per destination type (takedowns confirmed, sightings count, classification adopted)

3.5 NoMoreRansom (NMR)

Ransomware.live is listed under monitoring, but if a decryptor research effort produces anything, NMR is the destination.

Action: Add to the routing matrix:

Evidence type First API destination Second destination Internal system
Ransomware decryptor evidence NoMoreRansom (private channel) Victim CERT chain OpenCTI internal only

NMR coordinates so victims can decrypt before the adversary sees the fix — never publish a working decryptor publicly first.


4. Nice-to-Have

4.1 Submitter identity & signing

  • Register a stable submitter handle with MISP / MalwareBazaar / AbuseIPDB — not a personal account.
  • Sign internal objects with a project PGP key before they leave the system.
  • CIRCL and other major MISP communities weight trust by submitter history.

4.2 Audit log requirement

Every external submission writes an immutable row:

(timestamp, destination, payload_hash, submitter_identity, tlp, response_id, outcome)

Legal cover, debugging, and the feedback loop in 3.4 all need this.

4.3 NIS2 callout for critical-infra reporting

EU NIS2 mandates incident reporting from regulated entities within 24h of awareness. If detections involve essential/important entity sectors, the routing engine should flag NIS2 obligation regardless of receiver choice.

4.4 Section ordering

Sections 8 (data handling) and 9 (normalized object) are foundations, not appendices. Move them up to Sections 34. Currently a reader hits the platform list before knowing what not to send.

4.5 Confidence convention

low|medium|high is fine, but production CTI commonly uses the Admiralty Code (A1, B2, etc., describing source reliability × information credibility) or estimative language. Mention the convention even if you don't fully adopt it.


5. Implementation Notes (Blue48 Hookup)

This doc is the spec for two components in the agent stack:

  1. report_writer agent outputs Section 9's normalized object as its canonical format.
  2. A routing engine (extension of report_writer, or a 7th agent) consumes that object, applies the matrix in Section 6, and fans out via API adapters.

Agents stop at "produce the normalized object." Human review reads it, decides "yes, ship this to MISP and Cloudflare," and clicks. The routing engine then runs the API calls, captures receipts, and feeds them back to OpenCTI.

5.1 Suggested initial adapters (Block G priority)

  1. MISP (PyMISP)
  2. AbuseIPDB
  3. URLhaus
  4. Cloudflare Abuse Reports
  5. urlscan.io

These five cover ~80% of common evidence types in the routing matrix.

5.2 Secrets handling

Every adapter needs API credentials. They must:

  • Live in .env (already excluded from image via .dockerignore)
  • Be passed at container runtime via env_file, never baked into the image
  • Be rotatable on a schedule (the audit log in 4.2 helps prove non-overlap)

6. Summary

Category Count Notes
Critical 6 Geography, CERT mapping, registrar abuse, severity scale, actor block, PII sanitizer
High-value 5 TLP enforcement, STIX 2.1, rate limits, feedback loop, NoMoreRansom
Nice-to-have 5 Signing, audit log, NIS2, ordering, Admiralty Code

After the critical fixes, this is a publishable internal whitepaper and a clear spec for the routing engine. Good draft.


Detailed Review v2 — API-Eligible Cyber Threat Reporting & Escalation Platforms

Source: waypoints_scalpel.md

Reviewer: Claude (Opus 4.7, 1M context) Review date: 2026-05-13 Document reviewed: waypoints.md (first draft) Companion to: waypoints_firstpass.md (v1 executive summary) Scope of this v2: section-by-section findings, cross-cutting gaps, missing categories, revised schema, implementation priorities for blue48.


0. Method

I re-read the draft three times against the following lenses:

  1. Factual / API accuracy — does each platform actually do what's claimed?
  2. Operational correctness — would the routing actually work in practice, or break on first contact with reality?
  3. Legal / compliance — GDPR, NIS2, MLAT, jurisdiction, chain of custody
  4. Threat-model coverage — does this serve the actual project goal (campaign disruption, not individual attribution)?
  5. OPSEC of the reporter — what does the adversary learn from each submission?

Findings below carry confidence tags: [verified], [likely current], [verify before relying on].


1. Section-by-Section Findings

1.1.1 In Scenario 1.1 (normal credible threat), going to the victim first is correct in 90% of cases — but flag the exception.

Insider-attack scenarios reverse this: notifying a victim org whose own admin/employee is the threat actor warns the attacker. For credential-leak cases involving privileged accounts, route CERT-first and let CERT decide whether to notify the victim org's leadership or its security contact. Add a 1.1.bis for "victim contact may itself be compromised."

1.1.2 Scenario 1.2 (imminent harm) is missing a specific decision point.

If the imminent harm is to critical infrastructure (energy, water, healthcare, finance), in EU jurisdictions the NIS2 Directive mandates 24-hour reporting from regulated entities. Your routing engine should detect "victim sector ∈ NIS2 essential/important entity list" and either:

  • Route the report so the victim can fulfill their NIS2 obligation, OR
  • (If victim is unreachable) report directly via the relevant national CERT's NIS2 channel, which exists separately from generic CSIRT contact paths

1.1.3 Scenario 1.3 missing receiver categories:

  • Hosting providers (not just CDNs). Cloudflare is a CDN; the actual origin server is somewhere else (Hetzner, OVH, AWS, DigitalOcean, etc.). A Cloudflare-only report leaves the origin running. Add hosting provider abuse as a parallel step, not after CDN.
  • Domain registrars via WHOIS-extracted abuse contact, plus registry escalation for ccTLDs (DENIC for .de, AFNIC for .fr, EURid for .eu, Nominet for .uk)
  • Certificate authorities for compromised cert revocation (Let's Encrypt revoke API for ACME-issued certs; commercial CA abuse contacts for the rest)
  • DNS providers independent of registrar (Cloudflare DNS, Quad9, Google Public DNS abuse contacts — for blocking, not takedown)

1.1.4 The implicit ordering bias.

The draft optimizes for legal-defensibility (talk to the receiver who can act) but doesn't optimize for operational speed-to-mitigation. For phishing kits with active credential harvesting, the fastest mitigation is often: parallel-fan-out to (CDN, hosting, registrar, browser-block-list providers) simultaneously, then notify CERT as record-keeping. The doc reads as serial when in practice it should be parallel.


1.2 — Section 2: Tier-1 API Reporting Platforms

1.2.1 Missing platforms that belong in Tier 1:

Platform Why Tier-1 API style
abuse.ch ThreatFox IOC graph, sibling to URLhaus/MalwareBazaar, accepts indicator submissions with kill-chain context REST + Auth-Key
abuse.ch YARAify YARA rule sharing + scanning. Direct fit since detection_author emits YARA REST + Auth-Key
AlienVault OTX (now LevelBlue Labs OTX) One of the largest free CTI communities. Pulses for sharing, pull API for consumption. Major omission from current draft. REST + DirectConnect API
CIRCL Hashlookup Fast hash reputation lookup, free, EU-hosted REST
Shadowserver Free network exposure / vulnerability scanning reports. Subscribe by ASN/CIDR/contact. The draft has it under "monitoring" but Shadowserver also accepts submissions and runs important takedown campaigns. REST API

1.2.2 Reorder by jurisdictional fit:

The current #1 (CISA AIS) is US-government-tied. For Europe-focused work the right Tier-1 priorities are roughly:

  1. MISP (CIRCL communities, plus ENISA CSIRTs Network communities)
  2. OpenCTI (your own knowledge graph)
  3. AlienVault OTX (broad reach, low friction)
  4. CISA AIS (only if US-victim cases or US-relevant indicators)
  5. Cloudflare / hosting abuse APIs
  6. Spamhaus
  7. URLhaus / MalwareBazaar / ThreatFox
  8. AbuseIPDB
  9. urlscan.io
  10. Netcraft

1.2.3 Per-row corrections in the existing table:

  • CISA AIS — "STIX/TAXII bidirectional" — be specific: STIX 2.1 over TAXII 2.1, with the AIS Profile (a restricted subset of STIX). Submitting non-AIS-Profile STIX gets rejected. [verified]
  • Cloudflare Abuse Reports API — also requires noting that high-volume submitters can apply to be a Trusted Reporter which gets faster SLAs. [likely current]
  • VirusTotal API — public submissions are visible to all VT Premium customers (incl. potentially the adversary). The draft doesn't flag this — it's a critical OPSEC point. Use VT Private Scanning for sensitive samples. [verified]
  • PhishTank — community-vetted. As of late 2024 / early 2025 there were reports of reduced moderation activity. [verify before relying on]. Netcraft is the more reliable phishing-takedown channel today.
  • Google Web Risk — access truly is gated by Google customer engineering review; not a 5-minute API key signup. Apply early. [verified]

1.3 — Section 3: Per-Platform Notes

3.1 CISA AIS: Add: requires sponsorship from a federal agency or a signed AIS Sharing Agreement, plus the connector software (typically TAXII client). Onboarding measured in weeks, not days. The draft makes it sound like a sign-up form.

3.2 MISP: Missing:

  • ZeroMQ for real-time push (worth using if you want sub-second propagation to your own consumers)
  • Distinction between events (point-in-time intelligence) and feeds (continuous streams; better for IOC bulk delivery)
  • "Create a community" vs "Join a community" tradeoff — joining CIRCL's communities is the lowest-friction entry; creating your own is high-effort and pointless until you have multiple sharing partners
  • TLP-marking enforcement is not automatic at the MISP level — your client must respect TLP before publishing onward

3.3 OpenCTI: Missing:

  • The connector framework: ~80+ pre-built connectors (MITRE ATT&CK, MISP, CrowdStrike, Recorded Future, etc.) — most of your enrichment needs are already solved
  • The Workbench feature for analyst review before publishing
  • Filigran (the company behind OpenCTI) hosts a managed cloud version if you don't want to operate it yourself

3.4 Cloudflare Abuse Reports API: Missing:

  • API token requires Account.Abuse Reports permission — won't work with read-only tokens
  • Rate limits documented separately from the abuse API itself
  • For Cloudflare-hosted Workers (their serverless), abuse reports go to a different channel
  • Trusted Reporter program (mentioned above) — apply once you have submission history

3.5 Spamhaus: Missing the lists distinction:

  • DBL = Domain Block List (domains)
  • SBL = Spamhaus Block List (IPs)
  • XBL = Exploits Block List (exploit-sourced IPs)
  • ZRD = Zero Reputation Domains (newly registered)
  • Each list has different submission criteria. Wrong-list submissions get rejected. Your routing engine needs a list-selector.

3.6 AbuseIPDB: Missing:

  • The 23-category taxonomy (SSH brute force, port scan, web app attack, phishing, etc.) — your evidence type must map to an AbuseIPDB category code or the submission is low-utility
  • Free tier: 1000 reports/day, 100 IP checks/min. Paid tiers scale
  • Single-reporter submissions have low weight; reputation requires multiple corroborating submitters. Send to AbuseIPDB after sending to other corroborators

3.7 URLhaus: Missing:

  • Submission auth-key required (free, sign up)
  • Manual review for high-confidence flags
  • 2024+ stricter format requirements
  • Linkage to MalwareBazaar — submit the URL to URLhaus, the sample to MalwareBazaar, link by hash

3.8 MalwareBazaar: Missing:

  • File size limits (~250MB last I checked)
  • Office macro / Windows installer formats need specific tags
  • Tag taxonomy is community-driven; non-canonical tags reduce utility
  • The "Avoid" line about legal-share is correct but vague. Specifically: do not upload samples obtained under NDA, samples from incidents where the victim hasn't consented, or samples that may contain victim PII (e.g., crafted payloads with the victim's name)

3.9 PhishTank: As noted above, declining. Verify status; consider deprioritizing.

3.10 urlscan.io: Missing:

  • Visibility settings: public, unlisted, private (private = paid)
  • Public scans are searchable by everyone — including the adversary monitoring for their kits being analyzed
  • The Search API is invaluable for retrohunts: "show me every scan in the last 30 days that loaded resource X"
  • Bulk submission via UUID-tagged customagent field for tracking your submission cohort

3.11 Google Web Risk: Missing:

  • GCP project + Web Risk API enabled prerequisite
  • Submissions evaluated by Google Safe Browsing pipeline; latency hours-to-days
  • Successful submissions show up in Chrome / Firefox / Safari Safe Browsing warnings — massive amplification. Use only for high-confidence URLs

3.12 VirusTotal: Missing:

  • Public API: 4 lookups/min, 500/day, 15.5k/month
  • Premium API: rate limits negotiated
  • File submission privacy: anyone with VT Intelligence can see your sample. Critical OPSEC point not in draft.
  • VT Private Scanning for sensitive samples
  • VT Hunting (YARA livehunt) for ongoing detection

3.13 Netcraft: Missing:

  • Strong takedown-execution record — Netcraft actually does the takedown work, not just reporting
  • Free tier exists for low-volume reporters
  • Strongest at brand-protection / phishing
  • They prefer evidence package format: source URL + screenshot + redirect chain + landing page HTML

1.4 — Section 4: Internal Case / Incident Routing Platforms

1.4.1 Missing platforms:

Platform Best for Why missing matters
Wazuh Open-source SIEM with TheHive integration Many SOCs use it; integrates cleanly with this stack
Microsoft Sentinel Cloud SIEM with Logic Apps automation Major enterprise platform — leaving it out makes the doc feel non-enterprise
Splunk SOAR (formerly Phantom) Commercial SOAR Major in enterprise SOCs
Cortex XSOAR Commercial SOAR (Palo Alto) Same
Shuffle Open-source SOAR Free alternative to XSOAR/Phantom
Tracecat Newer open-source SOAR Younger but actively developed
n8n General workflow automation Not security-specific but widely used as a glue layer

1.4.2 TheHive 5 vs 4: Be explicit — TheHive 4 reached EOL, TheHive 5 is current. Code examples should target TheHive 5 API.


1.5 — Section 5: Monitoring (Not Primary Reporting)

1.5.1 Missing high-value monitoring sources:

Source What it gives you API
AlienVault OTX Largest free pulse community, IOC subscriptions REST DirectConnect
CIRCL Passive DNS / Passive SSL Historical DNS / cert lookups; EU-hosted REST
PhishStats Phishing URL stream REST + RSS
DNSDumpster / SecurityTrails / BinaryEdge Recon/asset-discovery DBs REST (mostly paid for bulk)
GreyNoise Benign-scanner classification — reduces false positives in IP reporting by tagging known internet-noise sources REST
Spamhaus DNSBL queries Free DNSBL lookups DNS protocol
Maltrail Open-source malicious-traffic detection feeds Static feed download
CT log monitors (crt.sh, Censys CT) New-cert issuance for your monitored domains — catches phishing-domain registrations REST

1.5.2 GreyNoise specifically deserves a callout.

Reporting an IP that GreyNoise classifies as benign-scanner (Shodan, Censys, security researchers) gets you blacklisted from AbuseIPDB and embarrasses you with CERTs. Always GreyNoise-check before submitting an IP report. This is a one-line API call that prevents a class of bad submissions.

1.5.3 Shadowserver placement.

Currently in Section 5 (monitoring only) but Shadowserver also runs active sinkholing and takedown campaigns with global reach. They accept tip-offs and IOC contributions. Move them up to Tier 1 receivers, or at least call out the bidirectional relationship.


1.6 — Section 6: Practical Routing Matrix

1.6.1 Missing rows:

Evidence type First Second Internal
Compromised TLS certificate CT log monitor sighting → CA revocation request Cloudflare/host if cert is in use OpenCTI / TheHive
Mobile app malware Google Play / Apple App Review submission VirusTotal sample upload OpenCTI
Cryptocurrency wallet (laundering) Chainalysis / TRM (commercial) or on-chain analysis OFAC SDN if sanctioned Internal restricted case
Open-source supply-chain attack Registry security (security@npmjs.com, security@python.org) GitHub Security Lab OpenCTI / TheHive
Compromised GitHub repo / leaked secret GitHub Security Advisory + vendor-specific revoke API (e.g., AWS IAM) Victim org Internal restricted
Tor hidden service hosting malware Document only (no takedown for .onion); push IOCs to MISP n/a OpenCTI
Sanctions-evasion crypto OFAC SDN reporting (US) / EU FSF reporting National FIU Internal restricted
CSAM (legally separate) NCMEC CyberTipline (US) / IWF (UK) / INHOPE (international) National police Stop processing immediately, preserve under legal hold
Phishing-resistant kit / 2FA bypass Browser vendor reports (Chrome / Firefox / Safari Trust & Safety) Affected service OpenCTI

1.6.2 Cloudflare-proxied abuse needs a follow-up step.

Current row says: First → Cloudflare API; Second → Netcraft / PhishTank. Missing: Third → origin host abuse contact (extracted by sending Cloudflare a HEAD request that bypasses cache, or via certificate transparency cross-reference). Without this, takedown leaves the origin alive and the attacker just provisions a new CDN front-end.

1.6.3 The "Leaked credentials/API keys" row is dangerously thin.

"Victim first → CERT if severe → Internal IR case" — missing the revocation step, which is more time-critical than reporting. If you find a leaked AWS access key, the first action is aws iam delete-access-key via the affected account (with permission) or trigger AWS's automatic key-revocation by submitting to GitHub Secret Scanning. If you find leaked OAuth tokens for GitHub/Slack/etc., the relevant vendor has an automated revocation pathway. Add the revocation step before victim notification.


1.7 — Section 7: Minimum Viable API Stack

The current MVP list (10 items) is too heavy for "minimum viable." A genuine MVP for a new white-hat group is closer to:

  1. OpenCTI — your knowledge graph (or, if too heavy, just MISP for both)
  2. MISP via CIRCL community — free, EU-hosted, broad reach
  3. AlienVault OTX — free, broadest reach for indicator sharing
  4. AbuseIPDB — free tier, easy
  5. URLhaus + MalwareBazaar + ThreatFox (the abuse.ch trio — same auth-key, three destinations)
  6. urlscan.io — free tier, evidence generation
  7. National CERT direct email + GPG — non-API, but mandatory

That's 7 things, of which 5 are pure free signups. Tackle Cloudflare/Netcraft/Spamhaus/GoogleWebRisk after you have throughput in those 7.

The current MVP includes TheHive — that's case management, not external reporting. Move it out of "API stack" since it's internal infrastructure.


1.8 — Section 8: Data Handling Rules

1.8.1 "Never submit publicly" — additions:

  • Insider-threat allegations without verification
  • Attribution claims about specific named individuals (the hard line we settled on earlier)
  • Government / classified material
  • PHI (US HIPAA scope)
  • PCI scope financial data
  • Children's data (COPPA US; GDPR Article 8 EU)
  • Biometric data
  • Trade secrets / source code
  • Material from unauthorized intrusion (even if you got to it via OSINT, "I downloaded their leaked DB" makes you a recipient of stolen goods in some jurisdictions)

1.8.2 "Safe to submit" — additions:

  • YARA rules (especially to YARAify)
  • Sigma rules (to SigmaHQ via PR)
  • Mutex names, named-pipe signatures (good Sysmon detections)
  • Persistence registry keys
  • Scheduled task names
  • TLS fingerprints (JA3, JA4)
  • HTTP user-agent strings observed in C2
  • ASN block ranges associated with adversary infrastructure
  • STIX/TAXII patterns
  • ATT&CK technique IDs (always)

1.8.3 Missing entire section: "Sanitize before submitting"

  • Strip URL query parameters that may contain victim tokens / session IDs
  • Hash email local-parts when target destination is public (a72b91…@example.com)
  • Redact internal hostnames from samples
  • Strip x-forwarded-for / source IP from log excerpts that name your honeypot
  • Replace victim-org names with role descriptors (<european_bank>) unless the submission is to a destination where the victim has consented or the receiver is trusted (CERT)

1.9.1 Schema gaps (additions in bold):

{
  "case_id": "WG-2026-000001",
  "schema_version": "1.0",
  "tlp": "AMBER",                            // use TLP 2.0 values: CLEAR/GREEN/AMBER/AMBER+STRICT/RED
  "tlp_marking_definition_ref": "marking-definition--...",  // STIX-compatible
  "severity": "low|medium|high|critical",   // replace A-E with standard
  "confidence": "low|medium|high",          // or Admiralty A1-F6
  "language": "en",                         // i18n
  "first_observed": "2026-05-13T10:00:00Z", // top-level
  "last_observed":  "2026-05-13T11:30:00Z",
  "valid_from":     "2026-05-13T10:00:00Z", // STIX-style validity window
  "valid_until":    "2026-08-13T10:00:00Z",
  "threat_type": "phishing|malware|ransomware|credential_exposure|iab|botnet|vulnerability_exploitation",

  "victim": {
    "organization": "",
    "domain": "",
    "country": "",
    "sector": "",
    "nis2_category": "essential|important|n/a",   // for EU NIS2 routing
    "consent_to_name_publicly": false             // sanitization gate
  },

  "actor": {
    "name": "Adira",
    "aliases": [],
    "campaign": "",
    "confidence": "A1|A2|...|F6"
  },

  "kill_chain": ["recon|weapon|deliver|exploit|install|c2|action"],
  "attack_techniques": ["T1566.001", "T1059.003"],

  "source": {
    "category": "forum|leak_site|telegram|honeypot|sensor|osint|tip",
    "first_seen": "",
    "last_seen": "",
    "collection_method": "lawful_osint_or_partner_feed",
    "burn_sensitivity": "low|medium|high"        // affects sanitization aggressiveness
  },

  "observables": {
    "ips": [],
    "domains": [],
    "urls": [],
    "hashes": [],
    "emails": [],
    "wallets": [],
    "cves": [],
    "yara_rules": [],
    "sigma_rules": [],
    "mutexes": [],
    "named_pipes": [],
    "scheduled_tasks": [],
    "registry_keys": [],
    "user_agents": [],
    "tls_fingerprints": [],                     // JA3/JA4
    "certificates": [],                         // CT log entries / SHA256 of cert
    "asn_blocks": [],
    "process_names": []
  },

  "pattern_relationships": [
    {"source": "domain:example.com", "type": "resolves_to", "target": "ipv4:1.2.3.4", "first_seen": "..."}
  ],

  "evidence": {
    "summary": "",
    "sanitized_screenshots": [],
    "raw_evidence_location": "internal_restricted_storage",
    "detonation_results": [],                   // sandbox report references
    "memory_artifacts": []                      // forensic, internal only
  },

  "timeline": [
    {"ts": "...", "event": "..."}
  ],

  "indicators_of_compromise": [],               // observables flagged as actively malicious

  "recommended_actions": [],

  "routing": {
    "primary_destinations": [],
    "secondary_destinations": [],
    "public_disclosure_allowed": false,
    "embargo_until": null,                      // timed disclosure
    "coordinated_with": []                      // who else has been told (CERT case IDs etc)
  },

  "audit": {
    "submitted_to": [],                         // append-only history of submissions
    "feedback_received": [],                    // ack IDs, takedown confirmations
    "submitter_identity": "wg-handle@misp",     // which submitter handle was used
    "signed_with": "PGP fingerprint",
    "object_sha256": ""                         // tamper-detect on the object itself
  }
}

1.9.2 Other schema concerns:

  • case_id format WG-2026-000001 is fine, but reserve a 2-char org prefix to avoid collision if you ever federate with another working group
  • tlp should use TLP 2.0 spec values (CLEAR, GREEN, AMBER, AMBER+STRICT, RED) — TLP 1.0 used different terms
  • Severity / confidence mismatch in v1: severity used A-E, confidence used words. Standardize.
  • Add a per-object hash so the routing engine can detect tampering between produce-time and submit-time

1.10 — Section 10: Final Recommendation

1.10.1 The architecture sentence is missing the feedback edge.

Current: Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting

Better: Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting → receipts and outcomes back to OpenCTI → effectiveness scoring → re-prioritization

Without the feedback edge, you can't tell which destinations are worth maintaining.

1.10.2 Missing entirely: closing checklist for "we're ready to submit."

A final checklist before any external submission fires:

[ ] TLP enforced (object.tlp <= destination.max_tlp)
[ ] Sanitization pass complete (PII stripped per destination policy)
[ ] GreyNoise check (if observables include IPs)
[ ] Quota available (rate-limit budget not exceeded)
[ ] Submitter identity registered with destination
[ ] Object signed
[ ] Audit row written
[ ] Human approver clicked yes (for non-automated tier)

This belongs as Section 11 or as the closing block of Section 10.


2. Cross-Cutting Gaps (Not Tied to Any Section)

2.1 — OPSEC for the Reporters Themselves

Not in the doc at all. If your group is reporting Adira to authorities, Adira may notice — they read MISP communities (those that are open), they read URLhaus (public), and they have visibility into VirusTotal Premium (paid customer).

Required additions:

  • Submission identity registry: which handle is used on which platform, who has access, rotation schedule
  • Account-creation OPSEC: don't use personal accounts on submission platforms; create a project handle, use a project email, register with project-owned phone/2FA
  • Network OPSEC for collection: if you're scraping leak sites or monitoring the adversary's infrastructure, route through a VPN or research-purpose proxy — never the same network as your submission identity
  • PGP for CERT comms: every national CERT publishes a PGP key. Every email submission to a CERT should be signed and encrypted. Untouched in the draft.

2.2 — Burnt Source Protection

If you have a private collection source (honeypot, infiltrated channel, tipped-off insider), publishing IOCs from it can burn the source. Specifically:

  • A unique honeypot fingerprint (banner, response timing, listening port) lets the adversary identify which sample came from your honeypot
  • Publishing a sample with a unique build artifact (your sandbox's hostname in a DNS query, a timestamp matching your detonation window) reveals your detonation infrastructure
  • Reporting a forum URL while it's still live tips off the forum operator that it's being watched

The doc needs a burn-sensitivity tier on each observable, and a sanitization step that aggressively scrubs source-identifying artifacts before any external submission.

2.3 — Adversary Observability of Your Submissions

Tier each receiver by who can see your submission:

Receiver Adversary visibility
MISP private community trusted community only
MISP public community / OTX public pulse anyone with an account
URLhaus public — adversary can monitor
MalwareBazaar public — adversary can detect their sample was uploaded
VirusTotal public submission every VT Premium customer (incl. potentially adversary)
VT Private Scanning only your team (paid)
AbuseIPDB public reputation visible
Cloudflare Abuse Reports only Cloudflare and the reported asset owner
CERT direct (GPG-encrypted) only the CERT

The routing engine should display this visibility for each destination during human review.

If any of this material may end up in a criminal proceeding, chain of custody matters. Specifically:

  • The raw evidence must be preserved unmodified, with hashes recorded at acquisition time
  • Any transformation (sanitization, normalization) must be reversible — the routing engine logs the input hash, the transform applied, and the output hash
  • The submitter identity for each external submission is logged
  • Witnesses (multi-party access logs) are preferred for high-value evidence

The current evidence.raw_evidence_location field is a placeholder; it needs structure: storage path, hash, acquisition timestamp, acquirer identity.

2.5 — Amplification Risk

Publishing IOCs publicly amplifies awareness — which is good for defenders but bad if:

  • The IOC includes a compromised legitimate site (you damage the site owner's reputation)
  • The IOC is for a piece of infrastructure that's about to be used in a sting operation by LE
  • The IOC reveals an investigation technique still under embargo

A publish-readiness review belongs in Section 1 of the doc, not in the closing checklist.

2.6 — Failure Modes / Retries

What happens when:

  • URLhaus rejects a submission (malformed, low-confidence flag, duplicate)?
  • MISP is down for maintenance?
  • Cloudflare returns 503?
  • Your submitter identity gets rate-limited?
  • An API token is revoked mid-batch?

The doc has no resilience layer. Recommend:

  • Idempotent submission with client-generated IDs (so retries don't double-submit)
  • Per-destination retry policy (exponential backoff with jitter)
  • Dead-letter queue for permanent failures — surface in human-review UI
  • Per-submitter quota tracking, with auto-failover to backup submitter if available

2.7 — Versioning and Maintenance

The doc has no version number, no changelog, no maintainer field, no review cadence. For a living spec like this:

---
schema_version: 1.0
last_reviewed: 2026-05-13
next_review_due: 2026-08-13
maintainer: <project lead>
changelog:
  - 2026-05-13: initial draft
---

API surfaces of these platforms change (Cloudflare deprecations, VT pricing changes, abuse.ch tag taxonomy updates). A quarterly re-validation cadence is sane.

2.8 — Multi-Language Submissions

Many national CERTs prefer or require local language for narrative fields (BSI German, ANSSI French, CCN-CERT Spanish). The submission object's language field (added above) plus a translation step in the routing engine handles this. Currently absent.


3. Missing Categories Entirely

3.1 — Hosting Provider Abuse Channels (most lack true REST APIs)

Provider Channel API?
AWS abuse@amazonaws.com + form No public REST; AWS responds to email
Google Cloud https://support.google.com/cloud/answer/2417620 Form-only
Azure https://msrc.microsoft.com/report/abuse Form + email
DigitalOcean abuse@digitalocean.com Email + status REST
Hetzner abuse@hetzner.com + form Form
OVH abuse@ovh.net + Anti-abuse REST API Yes
Linode (Akamai) abuse@linode.com Email
Vultr abuse@vultr.com Email

Treat email-based providers as a different submission class (template + GPG-signed email, with parsed-receipt detection). Worth a Section 11 in the doc.

3.2 — Cryptocurrency / Sanctions

  • Chainalysis Reactor — commercial, gold standard for on-chain investigations
  • TRM Labs — commercial alternative
  • CipherTrace (Mastercard) — commercial
  • OFAC SDN reporting — for US-sanctioned wallets
  • EU Financial Sanctions Files (FSF) — for EU sanctions
  • National FIUs — Financial Intelligence Units, country-specific
  • Free / open: GraphSense (open-source on-chain analytics), Etherscan (manual)

3.3 — Mobile / App Store

  • Google Play Protect submissions (for Android malware)
  • Apple App Review report (for malicious iOS apps)
  • APKMirror reports (for repackaged apps)
  • F-Droid security contacts (for compromised FOSS apps)

3.4 — Open-Source Supply Chain

3.5 — Certificate Authorities

  • Let's Encrypt: ACME revocation API for ACME-issued certs
  • Sectigo / DigiCert / GlobalSign / Entrust: abuse contacts in CA/Browser Forum compliance docs
  • CT log monitors for detection (crt.sh, Censys CT, Google CT)

3.6 — Tor / Dark Web

Limited takedown leverage for .onion services, but worth documenting:

  • Document via Tor Project's abuse handling page (limited leverage)
  • Contribute IOCs to DarkOwl, Recorded Future, Flashpoint (commercial dark-web monitoring) if you have access
  • Push to MISP with tor tag for community awareness

3.7 — CSAM (Legally Separate Pathway)

If CSAM is encountered during collection, stop processing immediately. CSAM has separate legal handling rules:

  • NCMEC CyberTipline (US)
  • IWF (Internet Watch Foundation) (UK)
  • INHOPE (international hotline network)
  • Possessing CSAM is illegal even for research; do not attempt to verify, document, or share. Report and delete from your systems under documented legal hold.

This deserves a Section 12 with a hard stop: "if encountered, halt and report via the channels below; do not include in any other submission flow."


4. Missing Platforms Worth Adding (Quick List)

Free / Open

  • AlienVault OTX (huge omission)
  • ThreatFox
  • YARAify
  • CIRCL Hashlookup
  • CIRCL Passive DNS / Passive SSL
  • Maltrail feeds
  • crt.sh / Censys CT
  • GreyNoise community tier
  • Spamhaus DNSBL queries
  • PhishStats

Commercial / Paid (worth listing for completeness)

  • Recorded Future
  • Mandiant Advantage (now Google Threat Intelligence)
  • CrowdStrike Falcon Intelligence
  • Sekoia.io
  • Flashpoint
  • DomainTools (passive DNS / WHOIS history)
  • RiskIQ (now Microsoft Defender Threat Intelligence)
  • Anomali ThreatStream

Intelligence Communities (membership-based)

  • FIRST.org (CSIRT global community)
  • Trusted Introducer (European CSIRT trust framework)
  • M3AAWG (Messaging, Malware, Mobile Anti-Abuse Working Group)
  • APWG (Anti-Phishing Working Group)
  • Cyber Threat Alliance (commercial CTI sharing)
  • ENISA CSIRTs Network

5. Implementation Priorities for Blue48

In our agent stack, this doc translates to concrete work:

5.1 — Block G additions (when we get there)

  1. report_writer agent outputs the v2 normalized object (Section 1.9.1 above) as canonical format
  2. New routing_engine component (extension of report_writer, or a 7th agent) — consumes the object, applies routing matrix, fans out via API adapters
  3. Adapter priority order for blue48 v1.0:
    1. MISP (PyMISP)
    2. AlienVault OTX (REST)
    3. AbuseIPDB (REST + category mapping)
    4. URLhaus + MalwareBazaar + ThreatFox (shared abuse.ch auth-key)
    5. urlscan.io (REST, with private-by-default visibility)
    6. Cloudflare Abuse Reports
    7. GPG-signed email to BSI / CERT-Bund (since the user is in DE)

5.2 — Schema work

  • config/submission_schema.json — JSON Schema for the v2 normalized object
  • config/routing_matrix.yaml — declarative rules: evidence type → destinations, with TLP ceilings and quotas
  • core/sanitize.py — pre-submission scrubbing per destination policy
  • core/audit.py — append-only log of every submission, signed
  • core/tlp.py — TLP 2.0 enforcement

5.3 — Pre-submission gates (before any adapter fires)

1. Schema valid?
2. TLP <= destination ceiling?
3. Sanitization complete?
4. GreyNoise check passes (for IPs)?
5. Quota available?
6. Submitter identity registered with destination?
7. Object signed?
8. Audit row written?
9. Human approver yes (for non-auto tier)?

If any fail → drop into human-review queue with the reason. Never silently skip.

5.4 — Failure / retry layer

  • Per-destination idempotency keys (client-generated)
  • Exponential backoff with jitter
  • Dead-letter queue for permanent failures, surfaced in data/dlq/
  • Per-submitter quota tracking with auto-failover

6. Summary of v2 Findings

Category Count Action
Section-by-section corrections 38 Fold into the draft
New cross-cutting sections needed 8 Add as Sections 1118
Missing platform categories 7 Each warrants a sub-section
Missing free/open platforms (Tier 1) 5 Add to Section 2
Schema field gaps 17 Adopt v2 schema above
Pre-submission gates not defined 9 Add as closing checklist

After folding these in, the document becomes a publishable internal whitepaper and a complete spec for the blue48 routing engine. The first draft was a confident outline; the v2 turns it into a working manual.