init: scaffold psyc — defensive CTI routing & evidence-sealing platform

Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence,
URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail),
flat Typer CLI, Result[T, E] type module, structlog config.
Architecture in docs/dossier.md; 12-fold style guide in docs/style.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
m17hr1l
2026-05-14 12:43:47 +02:00
commit e04c6c96d8
30 changed files with 8271 additions and 0 deletions

759
docs/archive/cockpit.md Normal file
View File

@@ -0,0 +1,759 @@
# Blue48 Operations Cockpit — GUI / UI-UX Concept
**Document type:** Project record / technical concept
**Scope:** GUI, operator workflow, worker observability, evidence handling, routing review, and IntelMiner dataset operations
**Status:** Draft v1
---
## 1. Purpose
The Blue48 Operations Cockpit is the human-facing command center for the worker mesh.
The GUI must let operators see, review, approve, seal, route, audit, and publish cyber-intelligence cases without losing control of sensitive evidence or outbound submissions.
The core principle is:
> The system may automate collection, enrichment, packaging, and routing, but humans must clearly see the chain of reasoning, evidence status, risk level, and outbound submissions before anything sensitive leaves the platform.
The GUI should not be a decorative dashboard first. It should be an operational cockpit.
---
## 2. Core Control Surfaces
The product should be designed around six main control surfaces:
| Control Surface | Primary Question Answered |
|---|---|
| **Cases** | What is happening? |
| **Evidence** | What is protected? |
| **Routing** | Where will it go? |
| **Workers** | What produced this result? |
| **Ledger** | What can we prove happened? |
| **Trainline** | What can become safe training data? |
These six areas should drive navigation, permissions, and MVP scope.
---
## 3. Main Navigation
Recommended sidebar navigation:
```text
OPERATIONS
- Mission Control
- Case Queue
- Worker Mesh
- Routing Review
- Receipts
EVIDENCE
- Evidence Vault
- Sealed Packages
- Retention
INTELLIGENCE
- Reports
- MISP / STIX Events
- Public Advisories
TRAINING
- IntelMiner
- Training Candidates
- Dataset Builder
SYSTEM
- Integrations
- Policy Engine
- Ledger
- Admin
```
Minimal route structure:
```text
/dashboard
/cases
/cases/:id
/cases/:id/evidence
/cases/:id/routing
/receipts
/workers
/trainline/candidates
```
---
## 4. Mission Control
Mission Control is the landing dashboard.
Its purpose is to show what is happening right now.
### Key Widgets
| Widget | Shows |
|---|---|
| **Active Signals** | New unreviewed leads from Scoutline |
| **Critical Queue** | Imminent harm / critical infrastructure cases |
| **Pending Human Review** | Cases waiting for analyst approval |
| **Sealed Evidence Packages** | Evidence encrypted and ready for authority handoff |
| **Outbound Reports** | Reports waiting to be sent |
| **Receipts / Acknowledgements** | CERT, MISP, abuse API, and provider responses |
| **Worker Health** | Workers running, degraded, failed, paused, or stopped |
| **Rate Limits** | API quota usage per destination |
| **Legal / TLP Warnings** | Items blocked by policy guard |
### Suggested Layout
```text
┌──────────────────────────────────────────────────────────────┐
│ Blue48 Operations Cockpit │
├──────────────┬──────────────┬──────────────┬────────────────┤
│ Critical │ Pending │ Sealed │ Submitted │
│ Cases │ Review │ Packages │ Reports │
├──────────────┴──────────────┴──────────────┴────────────────┤
│ Live Worker Mesh Timeline │
├──────────────────────────────┬───────────────────────────────┤
│ Priority Case Queue │ Destination / API Health │
├──────────────────────────────┴───────────────────────────────┤
│ Recent Receipts and Outcomes │
└──────────────────────────────────────────────────────────────┘
```
---
## 5. Case Queue
The Case Queue is the main daily-use screen.
Each row represents one signal or incident candidate.
### Recommended Columns
| Column | Meaning |
|---|---|
| Case ID | Unique internal case identifier |
| Class | A/B/C/D/E or Critical/High/Medium/Low |
| TLP | RED / AMBER / GREEN / CLEAR |
| Confidence | Low / Medium / High or Admiralty Code |
| Victim | Organization, domain, or unknown |
| Country | Used for CERT routing |
| Sector | Healthcare, finance, energy, government, etc. |
| Incident Type | Access sale, ransomware, phishing, credential leak, botnet, exploit, data leak |
| Actor | Known group / suspected actor / unknown |
| Current Worker | Worker currently responsible for the case |
| Next Action | Review, seal, route, submit, wait, archive |
| Deadline | SLA based on severity |
| Owner | Assigned analyst |
Example row:
```text
[CRITICAL] [TLP:AMBER] DE energy provider | access sale | high confidence | Sealer ready | Review required
```
### Filters
The queue should support filters for:
- severity
- class
- TLP
- country
- sector
- actor
- source type
- confidence
- pending approval
- failed submission
- critical infrastructure only
- worker state
---
## 6. Case Detail View
The Case Detail View is where analysts work on a single case.
Recommended tabs:
```text
Overview | Evidence | Timeline | Worker Output | Routing | Reports | Receipts | Audit
```
### Overview Tab
The Overview tab should show:
- case summary
- severity
- class
- confidence
- affected entity
- actor
- jurisdiction
- recommended route
- current state
- required approval
Example:
```text
Case: B48-2026-000184
Type: Initial Access Sale
Severity: Critical
TLP: AMBER
Confidence: High
Victim Country: Germany
Sector: Energy
Recommended Route:
1. CERT-Bund
2. Victim Security Team
3. Sector ISAC
4. MISP Trusted Community
```
---
## 7. Evidence View
The Evidence View is where the **Sealer** concept appears in the GUI.
Raw sensitive evidence should not casually render by default.
The UI should show evidence status instead of exposing raw contents.
### Evidence Status Labels
| Status | Meaning |
|---|---|
| **Unsealed** | Evidence exists internally but has not been authority-sealed |
| **Sealed** | Evidence has been encrypted for selected authorized recipients |
| **Plaintext Destroyed** | Local plaintext copy has been removed |
| **Local Key Destroyed** | Local unwrapped evidence key has been removed |
| **Recipient Decryptable** | Selected authority or victim can decrypt the package |
| **Public-Safe Extract Available** | Redacted/minimized metadata is available for public or semi-public destinations |
### Evidence Display Model
```text
Evidence Package
├── Metadata preview: visible
├── Sensitive content: locked by default
├── Hashes: visible
├── Recipient keys: visible
├── Local decryption access: unavailable after key destruction
└── Chain of custody: visible
```
### Evidence Actions
Recommended actions:
- **Seal Evidence**
- **Add Recipient Key**
- **Verify Package Hash**
- **Destroy Local Plaintext**
- **Destroy Local Unwrapped Key**
- **Generate Public-Safe Extract**
- **Request Human Approval**
The UI should make the trust state obvious:
```text
Raw evidence: locked
Sealed package: ready
Local plaintext: destroyed
Local key: destroyed
Recipient: CERT-Bund can decrypt
Public extract: available
```
---
## 8. Worker Mesh View
The Worker Mesh View is the observability screen for the processing pipeline.
It should show the worker topology and the health of each worker.
### Worker Lines
```text
Scoutline
SourcePlanner → Crawler → Fetcher → Parser → Deduper → SourceRanker
Proofline
Signalizer → Correlator → IOCChecker → ClaimChecker → ConfidenceScorer
Mapline
EntityResolver → GeoResolver → SectorMapper → ActorMapper → CVEResolver
Sealine
EvidencePackager → Sealer → KeyBurner → RetentionGuard
Routeline
RoutePlanner → PayloadBuilder → Courier → ReceiptCollector
Trainline
IntelMiner → LicenseChecker → Chunker → Labeler → QualityGate → DatasetWriter
```
### Worker Tile Fields
Each worker tile should show:
| Field | Meaning |
|---|---|
| Status | Healthy, degraded, failed, paused, or stopped |
| Queue Depth | Number of waiting jobs |
| Last Run | Most recent execution timestamp |
| Error Count | Recent failures |
| Average Processing Time | Performance indicator |
| Model / API Used | Which model, API, or rule engine was used |
| Cost Estimate | Optional model/API cost estimate |
| Last Output Sample | Small safe preview of output |
| Controls | Retry, pause, resume, open logs |
The goal is to prevent the system from becoming a black box.
---
## 9. Routing Review Screen
The Routing Review Screen is where humans approve outbound reports.
It should show recommended destinations, payload types, policy decisions, and blocks.
Example:
```text
Recommended destinations:
✓ CERT-Bund — sealed evidence package
✓ Victim Security Team — sealed evidence package
✓ MISP Trusted Community — TLP:AMBER STIX indicators
✓ Cloudflare Abuse — minimized abuse report
✕ VirusTotal — blocked: contains sensitive sample / TLP too high
```
### Destination Fields
| Field | Purpose |
|---|---|
| Destination | CERT, MISP, provider, abuse API, registrar, law enforcement, victim |
| Payload Type | Sealed package, STIX bundle, minimized abuse report, advisory draft |
| Max TLP Allowed | Prevents over-sharing |
| Required Auth | API key, PGP, portal, structured email, OIDC |
| Rate-Limit Budget | Whether submission can happen now |
| Policy Status | Allowed, blocked, or needs approval |
| Legal Status | Safe, review required, or blocked |
| Expected Receipt | Case ID, acknowledgement, or status URL |
### Actions
Recommended actions:
- **Approve Selected Routes**
- **Block Route**
- **Require Legal Review**
- **Send to Sealer**
- **Send to Redactor**
- **Submit Now**
The interface should never provide one broad dangerous action such as **Send Everything**.
---
## 10. Report Builder
The Report Builder creates destination-specific outputs.
### Report Templates
| Template | Used For |
|---|---|
| **Victim Notification** | Direct affected organization |
| **CERT Notification** | National CERT / CSIRT |
| **Law Enforcement Referral** | Criminal activity |
| **Provider Abuse Report** | Hosting, CDN, registrar, cloud, email provider |
| **MISP Event** | CTI sharing |
| **Public Advisory** | Sanitized public report |
| **Training Example** | LoRA dataset candidate |
### Recommended Layout
```text
Left pane: structured case data
Right pane: generated report preview
```
### Warning Flags
The builder should warn when a draft:
- contains PII
- contains raw credentials
- contains TLP:RED material
- contains victim name
- contains exploit detail
- contains unsealed evidence
- exceeds destination TLP allowance
- targets a public or semi-public platform with sensitive content
---
## 11. IntelMiner / Trainline UI
The IntelMiner and Trainline UI should be separate from active operations.
This prevents analysts from confusing live cases with training candidates.
### Dataset Sources Screen
Shows:
| Field | Meaning |
|---|---|
| Source Name | Human-readable source name |
| URL / API | Collection endpoint |
| License Status | Approved, restricted, unknown, rejected |
| Allowed for Training | Yes / no / review required |
| Last Collected | Most recent collection timestamp |
| Document Count | Number of collected documents |
| Failure Rate | Recent collection reliability |
### Training Candidate Queue
Each candidate should show:
| Field | Meaning |
|---|---|
| Task | IOC extraction, routing, classification, report writing, actor normalization |
| Source | Advisory, blog, report, synthetic, internal |
| License | Approved, restricted, unknown |
| Quality Score | Estimated usefulness |
| Safety Flag | Safe, needs review, reject |
| Reviewer Status | Pending, approved, rejected, edited |
### Example Review Screen
The reviewer should see:
```text
Instruction
Input
Expected Output
Metadata
Source License
Safety Flags
```
Actions:
- **Approve**
- **Reject**
- **Edit**
- **Send Back to Labeler**
- **Mark as Unsafe**
- **Export to JSONL**
### Dataset Builder
The Dataset Builder should show:
- examples by task
- token counts
- train/validation split
- duplicates
- class imbalance
- rejected examples
- export version
Example dataset versions:
```text
dataset-router-v0.1
dataset-ioc-extractor-v0.3
dataset-report-writer-v0.2
```
---
## 12. Roles and Permissions
The GUI requires strict role-based access control.
| Role | Can Do |
|---|---|
| **Viewer** | Read dashboards and public-safe summaries |
| **Analyst** | Review signals, enrich cases, draft reports |
| **Sealer Officer** | Seal evidence and manage recipient keys |
| **Router Officer** | Approve destinations and routing decisions |
| **Legal Reviewer** | Approve sensitive or cross-border submissions |
| **Admin** | Manage users, integrations, policies, and configuration |
| **Dataset Curator** | Approve training examples and exports |
| **Auditor** | Read ledger and export compliance logs |
### Two-Person Control
Critical actions should require two-person approval:
- send sealed evidence
- submit to law enforcement
- publish a public advisory
- destroy plaintext
- destroy local unwrapped evidence keys
- export a training dataset
- modify routing policy
- modify recipient keys
---
## 13. Case State Machine
Every case should follow a clear state machine.
### Normal States
```text
NEW_SIGNAL
→ PARSED
→ VERIFIED
→ MAPPED
→ CLASSIFIED
→ REVIEW_REQUIRED
→ EVIDENCE_PACKAGED
→ SEALED
→ ROUTE_PROPOSED
→ APPROVED_FOR_SUBMISSION
→ SUBMITTED
→ ACKNOWLEDGED
→ ACTIONED
→ ARCHIVED
```
### Error / Block States
```text
BLOCKED_BY_TLP
BLOCKED_BY_POLICY
NEEDS_LEGAL_REVIEW
DESTINATION_RATE_LIMITED
SUBMISSION_FAILED
INSUFFICIENT_CONFIDENCE
DUPLICATE_CASE
```
The state machine should be visible in the Case Detail View.
---
## 14. UI / UX Principles
### Make Risk Visible
Every screen should answer:
- What is the severity?
- What is the confidence?
- Who is affected?
- What data is sensitive?
- Who can decrypt it?
- What will be sent?
- Where will it be sent?
- What policy allows or blocks this?
### Make Automation Interruptible
Analysts must be able to:
- pause a worker
- block a route
- downgrade confidence
- require legal review
- mark as duplicate
- prevent publication
- reopen a case
### Make Evidence Status Obvious
Use labels such as:
```text
Raw evidence: locked
Sealed package: ready
Local plaintext: destroyed
Local key: destroyed
Recipient: CERT-Bund can decrypt
Public extract: available
```
### Avoid Dangerous UX Patterns
Avoid:
- one-click “send all” actions
- hidden payloads
- unclear TLP labels
- buried warnings
- irreversible actions without confirmation
- publishing controls mixed with private reporting controls
- exposing raw evidence by default
---
## 15. Minimal MVP GUI
Do not build everything first.
The first useful MVP should include:
1. Mission Control
2. Case Queue
3. Case Detail
4. Evidence Sealing View
5. Routing Review
6. Courier Receipts
7. Worker Health
8. IntelMiner Dataset Queue
### MVP Routes
```text
/dashboard
/cases
/cases/:id
/cases/:id/evidence
/cases/:id/routing
/receipts
/workers
/trainline/candidates
```
This MVP is enough to operate safely while keeping the scope manageable.
---
## 16. Recommended Technical Stack
| Layer | Recommendation |
|---|---|
| Frontend | React / Next.js |
| UI Components | shadcn/ui + Tailwind |
| Charts | Recharts |
| Workflow Graph | React Flow |
| Tables | TanStack Table |
| Backend API | FastAPI |
| Worker Orchestration | Celery, Temporal, or Prefect |
| Database | PostgreSQL |
| Search | OpenSearch or Meilisearch |
| Graph Intelligence | OpenCTI / Neo4j optional |
| Object Storage | S3-compatible encrypted storage |
| Audit Log | Append-only PostgreSQL table or immutability layer |
| Auth | OIDC / Keycloak |
| Realtime Updates | WebSockets or Server-Sent Events |
React Flow is especially useful for the Worker Mesh screen.
---
## 17. Visual Identity
The design should feel:
- calm
- operational
- serious
- high-trust
- defensive
- readable under pressure
Avoid:
- cyberpunk styling
- hacker neon
- gamification
- aggressive animation
- cluttered dashboards
Recommended style:
```text
Dark mode by default
High-contrast severity labels
Muted blue/gray base
Red only for critical
Amber for warnings
Green for completed/safe
Clear TLP badges
Large readable tables
Minimal animations
```
Recommended UI language:
```text
Evidence protected.
Route blocked by policy.
Human approval required.
Recipient can decrypt.
Local key destroyed.
Submission acknowledged.
```
---
## 18. Final Operating Model
The GUI should support this operational chain:
```text
Detect
→ Validate
→ Classify
→ Seal Evidence
→ Review Routes
→ Submit Reports
→ Track Receipts
→ Archive Safely
→ Publish Sanitized Intelligence
→ Build Reviewed Training Data
```
The core cockpit should keep humans in control of five things:
```text
1. Cases — what is happening?
2. Evidence — what is protected?
3. Routing — where will it go?
4. Workers — what produced this result?
5. Ledger — what can we prove happened?
```
The training workspace adds the sixth:
```text
6. Trainline — what can become safe training data?
```
---
## 19. Summary
The Blue48 GUI should be an operations cockpit, not a passive dashboard.
It must provide:
- live case visibility
- worker observability
- authority-sealed evidence control
- human routing approval
- TLP and policy enforcement
- receipt and outcome tracking
- immutable audit visibility
- safe IntelMiner training-data review
The first MVP should focus on daily operational safety and decision control before advanced analytics or public-reporting features are added.

72
docs/archive/codex.md Normal file
View File

@@ -0,0 +1,72 @@
# Codex — Blue48 / Adira Hunt Records
**Document type:** Master index
**Status:** Draft v2
---
## Records
| File | Stage / Role | Purpose |
|---|---|---|
| `waypoints.md` | Reference | Inventory of API-eligible reporting & escalation destinations. |
| `waypoints_firstpass.md` | Review | First-pass review notes and suggestions on `waypoints.md`. |
| `waypoints_scalpel.md` | Review | Section-by-section deep review, cross-cutting gaps, revised schema. |
| `routeline.md` | Architecture | Reporting & API escalation pipeline with authority-sealed evidence handling. |
| `hivemap.md` | Architecture | Worker mesh breakdown — worker lines, responsibilities, human-review boundaries. |
| `intelminer.md` | Pipeline | Lawful data collection, LoRA-ready JSONL format, dataset governance. |
| `cockpit.md` | Interface | GUI / UI-UX concept for operating the worker mesh and review workflows. |
| `codex.md` | Meta | This index. |
---
## Suggested Reading Order
1. `routeline.md`
2. `hivemap.md`
3. `intelminer.md`
4. `cockpit.md`
5. `waypoints.md`
6. `waypoints_firstpass.md`
7. `waypoints_scalpel.md`
---
## Architecture Sentence
```text
Sensors
→ Scoutline
→ Proofline
→ Mapline
→ Classifyline
→ Sealine
→ Routeline
→ Ledgerline
→ Publishline
→ Trainline
→ Blue48 Operations Cockpit
```
---
## Core Principle
> Validate the signal, protect the evidence, route only what each destination is authorized to receive, and prove every external action through an immutable ledger.
---
## Rename Map (2026-05-13)
For anyone holding older references:
| Old | New |
|---|---|
| `api_eligible_cyber_threat_reporting_platforms.md` | `waypoints.md` |
| `api_eligible_cyber_threat_reporting_platforms_review.md` | `waypoints_firstpass.md` |
| `api_eligible_cyber_threat_reporting_platforms_review_v2_detailed.md` | `waypoints_scalpel.md` |
| `blue48_reporting_api_architecture_v2.md` | `routeline.md` |
| `blue48_worker_mesh_architecture.md` | `hivemap.md` |
| `blue48_intelminer_lora_pipeline.md` | `intelminer.md` |
| `blue48_operations_cockpit_ui_ux.md` | `cockpit.md` |
| `blue48_project_records_index.md` | `codex.md` |

451
docs/archive/hivemap.md Normal file
View File

@@ -0,0 +1,451 @@
# Blue48 Worker Mesh Architecture
**Document type:** Project record / technical architecture
**Scope:** Worker names, responsibilities, interfaces, data flow, human review boundaries
**Status:** Draft v1
---
## 1. Purpose
Blue48 should not rely on one large, expensive, opaque model to perform all cyber-intelligence operations. The platform should be built as a mesh of small, specialized workers.
Each worker performs one narrow function, writes structured output, and passes a normalized case object to the next stage. Heavy models are reserved for judgment-heavy tasks such as confidence scoring, routing explanations, public report drafting, and training-example generation.
Core principle:
> Small workers produce traceable outputs. Humans approve sensitive decisions. The Ledger proves what happened.
---
## 2. High-Level Flow
```text
Scoutline
→ Proofline
→ Mapline
→ Classifyline
→ Sealine
→ Routeline
→ Ledgerline
→ Publishline
→ Trainline
```
Operator version:
```text
Detect → Validate → Map → Classify → Seal Evidence → Route → Submit → Track → Archive → Learn
```
---
## 3. Worker Lines
| Line | Purpose |
|---|---|
| **Scoutline** | Finds, fetches, parses, and deduplicates lawful intelligence sources. |
| **Proofline** | Validates claims, checks indicators, measures freshness, and scores confidence. |
| **Mapline** | Resolves victims, actors, sectors, jurisdictions, CERT routes, and affected products. |
| **Classifyline** | Assigns severity, TLP, incident type, and operational class. |
| **Sealine** | Packages evidence, encrypts it for authorized recipients, and destroys local plaintext/key material when policy allows. |
| **Routeline** | Selects destinations, builds payloads, enforces destination policy, and submits reports. |
| **Ledgerline** | Records immutable audit events, receipts, outcomes, and follow-up status. |
| **Publishline** | Produces sanitized public intelligence only after mitigation and approval. |
| **Trainline** | Converts lawful, reviewed intelligence into LoRA-ready training data. |
---
## 4. Core Worker Set
The first conceptual worker set is:
```text
Scout → Verifier → Mapper → Classifier → Sealer → Router → Courier → Ledger
```
Support workers:
```text
Watcher → Archivist → Publisher
```
Operational sentence:
```text
Scout detects.
Verifier confirms.
Mapper identifies.
Classifier prioritizes.
Sealer protects.
Router decides.
Courier submits.
Ledger proves.
Watcher follows up.
Archivist forgets safely.
Publisher informs.
```
---
## 5. Granular Worker Breakdown
### 5.1 Scoutline
| Worker | Job | Model requirement |
|---|---|---|
| **SourcePlanner** | Maintains the approved source list, collection schedules, and source eligibility. | None / rules |
| **Crawler** | Discovers new pages, feeds, advisories, reports, APIs, and datasets. | None |
| **Fetcher** | Downloads pages, PDFs, JSON, RSS, STIX/TAXII, MISP events, and API responses. | None |
| **Parser** | Extracts title, date, author, body, tables, indicators, and metadata. | Rules / small model |
| **Deduper** | Detects duplicate reports, reposted IOCs, syndicated articles, and repeated claims. | Embeddings / rules |
| **SourceRanker** | Scores the source based on trust, history, origin, and license status. | Rules / small model |
| **Signalizer** | Converts parsed content into candidate intelligence signals. | Small/medium model |
Output:
```json
{
"signal_id": "uuid",
"source_type": "advisory | cti_report | abuse_feed | ransomware_monitor | public_blog | misp_event",
"summary": "short defensive summary",
"observed_at": "2026-05-13T00:00:00Z",
"raw_evidence_location": "internal-only-reference"
}
```
---
### 5.2 Proofline
| Worker | Job |
|---|---|
| **Correlator** | Checks whether the same signal appears across multiple independent sources. |
| **IOCChecker** | Validates domains, IPs, hashes, URLs, wallet addresses, emails, and CVEs. |
| **FreshnessChecker** | Determines whether the signal is current, stale, repeated, or resurfaced. |
| **ClaimChecker** | Labels language as confirmed, claimed, observed, rumored, or speculative. |
| **ConfidenceScorer** | Produces final confidence and optional Admiralty Code values. |
Output:
```json
{
"confidence": "low | medium | high",
"source_reliability": "A | B | C | D | E | F | unknown",
"information_credibility": "1 | 2 | 3 | 4 | 5 | 6 | unknown",
"claim_status": "confirmed | claimed | observed | rumored | speculative",
"freshness": "new | recent | stale | resurfaced"
}
```
---
### 5.3 Mapline
| Worker | Job |
|---|---|
| **EntityResolver** | Maps organization names, domains, subsidiaries, brands, and aliases. |
| **GeoResolver** | Maps victim country, jurisdiction, national CERT, and cross-border implications. |
| **SectorMapper** | Maps victim sector and critical-infrastructure status. |
| **ActorMapper** | Maps actor names, aliases, ransomware brands, campaigns, and confidence. |
| **CVEResolver** | Maps vulnerabilities to CVEs, affected products, KEV status, and exploit relevance. |
Output:
```json
{
"victim": {
"name": "",
"domain": "",
"country": "",
"sector": "",
"critical_infrastructure": false
},
"actor": {
"name": "",
"aliases": [],
"campaign": "",
"confidence": "low | medium | high"
},
"jurisdiction": {
"primary_cert": "",
"law_enforcement_route": "",
"sector_isac": ""
}
}
```
---
### 5.4 Classifyline
| Worker | Job |
|---|---|
| **Classifier** | Assigns incident type, severity, internal class, and response SLA. |
| **TLPGuard** | Ensures TLP data cannot be routed to destinations that cannot receive it. |
| **DestinationPolicyGuard** | Blocks inappropriate, illegal, excessive, or sensitive submissions. |
Internal class mapping:
| Internal class | Meaning | External severity |
|---|---|---|
| **A** | Imminent harm or attack likely underway | Critical |
| **B** | Credible planned attack | High |
| **C** | Confirmed exposure | High / Medium |
| **D** | Campaign intelligence | Medium / High |
| **E** | Weak signal or watchlist item | Low / Monitor |
Output:
```json
{
"class": "A | B | C | D | E",
"severity": "low | medium | high | critical",
"tlp": "RED | AMBER | GREEN | CLEAR",
"incident_type": "ransomware | credential_leak | access_sale | phishing | malware | exploit | botnet | data_leak",
"policy_blocks": []
}
```
---
### 5.5 Sealine
Sealine replaces the old primary concept of “sanitization.” The objective is not to destroy useful evidence, but to protect it.
| Worker | Job |
|---|---|
| **EvidencePackager** | Collects sensitive evidence, hashes it, and packages it with metadata. |
| **Sealer** | Encrypts evidence for authorized recipients using public-key or hybrid encryption. |
| **KeyBurner** | Destroys local unwrapped evidence keys after successful sealing. |
| **RetentionGuard** | Enforces retention, deletion, plaintext destruction, and crypto-erasure policy. |
Sealine principle:
> Preserve the truth. Seal the sensitive evidence. Route only what each recipient is authorized to receive.
Output:
```json
{
"sealed_evidence": {
"package_id": "uuid",
"encryption": "age | PGP | CMS | hybrid",
"recipient_keys": [
{
"recipient": "CERT-Bund",
"key_id": "authority-key-id",
"wrapped_key": "encrypted-evidence-key"
}
],
"payload_hash": "sha256",
"plaintext_destroyed": true,
"local_unwrapped_key_destroyed": true
}
}
```
---
### 5.6 Routeline
| Worker | Job |
|---|---|
| **RoutePlanner** | Chooses destination order based on victim, country, sector, severity, TLP, and evidence type. |
| **PayloadBuilder** | Builds destination-specific payloads: sealed package, STIX bundle, MISP event, abuse report, or public-safe extract. |
| **Redactor** | Minimizes public/semi-public outputs only. Redactor does not replace Sealer. |
| **Courier** | Submits through API, portal, structured email, or secure upload. |
| **RateLimiter** | Enforces destination quotas, retries, and backoff. |
| **ReceiptCollector** | Captures case IDs, acknowledgements, API responses, and status URLs. |
Example route object:
```json
{
"routes": [
{
"destination": "CERT-Bund",
"type": "authority",
"payload": "sealed_evidence_package",
"priority": 1,
"max_tlp_allowed": "RED"
},
{
"destination": "MISP trusted community",
"type": "cti_sharing",
"payload": "stix_indicators",
"priority": 2,
"max_tlp_allowed": "AMBER"
},
{
"destination": "Cloudflare Abuse API",
"type": "provider_abuse",
"payload": "minimized_abuse_report",
"priority": 3,
"max_tlp_allowed": "CLEAR"
}
]
}
```
---
### 5.7 Ledgerline
| Worker | Job |
|---|---|
| **Ledger** | Creates immutable audit records for all external submissions and destructive actions. |
| **Watcher** | Polls outcomes: takedown status, MISP sightings, CERT acknowledgement, provider response. |
| **Archivist** | Handles retention, sealed package lifecycle, legal holds, and crypto-erasure confirmation. |
Ledger record:
```json
{
"timestamp": "2026-05-13T00:00:00Z",
"case_id": "B48-2026-000001",
"destination": "CERT-Bund",
"payload_hash": "sha256",
"submitter_identity": "blue48-official-handle",
"tlp": "AMBER",
"response_id": "external-case-id",
"outcome": "submitted | acknowledged | rejected | actioned"
}
```
---
### 5.8 Publishline
| Worker | Job |
|---|---|
| **Publisher** | Produces public-safe intelligence reports after mitigation and approval. |
Publisher may include:
- sector trend
- actor trend
- CVEs
- TTPs
- defensive recommendations
- sanitized IOCs
- non-sensitive timelines
Publisher must not include:
- raw credentials
- stolen data
- victim secrets
- live access details
- exact criminal-source links
- unmitigated exploit paths
---
## 6. Which Workers Need Models?
| Worker | Model need |
|---|---|
| SourcePlanner | None / rules |
| Crawler / Fetcher | None |
| Parser | Rules / small model |
| Deduper | Embeddings / rules |
| Signalizer | Small or medium model |
| ClaimChecker | Small or medium model |
| ConfidenceScorer | Medium model |
| EntityResolver | Rules + embeddings |
| ActorMapper | Small or medium model |
| Classifier | Small or medium model |
| RoutePlanner | Rules first, model second |
| PayloadBuilder | Small model |
| Publisher | Medium or large model |
| ExampleBuilder | Medium model |
| QualityGate | Medium model + rules |
Heavy models should be reserved for:
```text
ConfidenceScorer
Classifier
Publisher
ExampleBuilder
QualityGate
```
---
## 7. Human Review Boundaries
Human approval is required before:
- sending sealed evidence to any external destination
- contacting law enforcement or CERTs with sensitive evidence
- publishing a public advisory
- destroying plaintext evidence
- destroying local unwrapped evidence keys
- exporting a training dataset
- modifying routing policy
- modifying recipient keys
Two-person control should be required for:
- sending TLP:RED or highly sensitive packages
- deleting evidence
- changing authority recipient keys
- publishing named-victim reports
- exporting training data based on internal cases
---
## 8. MVP Worker Build Order
Initial worker implementation priority:
1. SourcePlanner
2. Fetcher
3. Parser
4. Deduper
5. Signalizer
6. IOCChecker
7. EntityResolver
8. GeoResolver
9. Classifier
10. EvidencePackager
11. Sealer
12. RoutePlanner
13. Courier
14. Ledger
15. ReceiptCollector
16. IntelMiner
Minimum operational chain:
```text
Fetcher → Parser → Signalizer → IOCChecker → EntityResolver → Classifier → Sealer → RoutePlanner → Courier → Ledger
```
---
## 9. Technical Notes
Recommended implementation style:
| Component | Recommendation |
|---|---|
| Worker runtime | Python services, Celery, Temporal, Prefect, or lightweight queue workers |
| Message format | JSON normalized case object |
| Interop format | STIX 2.1 where useful |
| Storage | PostgreSQL + object storage |
| Search | OpenSearch or Meilisearch |
| CTI graph | OpenCTI or MISP integration |
| Audit | append-only ledger table |
| Secrets | `.env`, secret manager, runtime injection only |
| UI | Blue48 Operations Cockpit |
---
## 10. Summary
Blue48 should operate as a worker mesh, not a monolithic AI agent.
The system should use small deterministic workers where possible, small models where useful, and larger models only for judgment-heavy steps. Sensitive evidence is handled by Sealine, not casually rendered or distributed. Routing and public reporting are controlled by policy guards, human review, and immutable audit logging.

383
docs/archive/intelminer.md Normal file
View File

@@ -0,0 +1,383 @@
# Blue48 IntelMiner and LoRA Training Data Pipeline
**Document type:** Project record / technical concept
**Scope:** Lawful intelligence collection, training-data preparation, LoRA dataset format, quality gates, safety boundaries
**Status:** Draft v1
---
## 1. Purpose
IntelMiner is the Blue48 worker responsible for collecting lawful defensive cyber-intelligence and converting it into reviewed, license-safe, LoRA-ready training examples.
IntelMiner does not train models to hack. It prepares training data for defensive tasks such as indicator extraction, routing, severity classification, evidence handling, and safe report writing.
Core mission:
> IntelMiner collects lawful defensive cyber-intelligence from approved online sources and transforms it into reviewed, license-safe, LoRA-ready JSONL examples for specialized defensive models.
---
## 2. What IntelMiner Should Learn From
Allowed source categories:
- national CERT advisories
- CISA, ENISA, NCSC, CERT-EU, BSI, ANSSI, and similar public advisories
- CVE, NVD, and exploited-vulnerability catalogs
- public vendor threat reports
- public malware-analysis reports
- public ransomware trend reports from lawful monitors
- MISP events where the license and sharing group permit reuse
- abuse.ch datasets where permitted
- public IOCs and defensive detection content
- public incident writeups
- internally written reports approved for training
- synthetic examples written by analysts
Restricted or excluded source categories:
- raw stolen data
- raw credentials
- private victim communications
- criminal-forum content obtained without authorization
- confidential CTI provider content without training rights
- TLP:RED material
- material with unknown or incompatible license
- content that teaches exploitation, persistence, credential abuse, ransomware operation, or evasion
---
## 3. IntelMiner Worker Chain
```text
SourcePlanner
→ Collector
→ LicenseChecker
→ ContentParser
→ Chunker
→ Labeler
→ ExampleBuilder
→ QualityGate
→ ReviewerQueue
→ DatasetWriter
```
---
## 4. Worker Responsibilities
| Worker | Responsibility |
|---|---|
| **SourcePlanner** | Defines approved sources, update schedules, license expectations, and collection priority. |
| **Collector** | Pulls data from APIs, RSS, advisories, STIX/TAXII, MISP, GitHub, PDFs, and public reports. |
| **LicenseChecker** | Determines whether the material may be used for training. Blocks unknown or restricted content. |
| **ContentParser** | Extracts text, IOCs, dates, actors, CVEs, TTPs, victim sectors, and source metadata. |
| **Chunker** | Splits long content into training-sized units while preserving context. |
| **Labeler** | Assigns task labels such as IOC extraction, routing, classification, report writing, and evidence handling. |
| **ExampleBuilder** | Converts chunks into instruction/input/output training examples. |
| **QualityGate** | Removes unsafe, duplicated, mislabeled, low-confidence, or license-problematic examples. |
| **ReviewerQueue** | Sends candidates to human reviewers. Nothing enters the final dataset without approval. |
| **DatasetWriter** | Exports approved examples as versioned JSONL datasets. |
---
## 5. Training Tasks
The LoRA adapters should learn defensive operations only.
| Task | Purpose |
|---|---|
| **ioc_extraction** | Extract domains, IPs, URLs, hashes, emails, wallets, CVEs, and file names. |
| **ttp_mapping** | Map report language to MITRE ATT&CK-style techniques. |
| **severity_classification** | Classify weak signal, credible threat, confirmed exposure, campaign intelligence, or imminent harm. |
| **routing_decision** | Decide which reporting destinations are appropriate and in what order. |
| **evidence_handling** | Decide whether evidence must be sealed, minimized, excluded, or internally retained. |
| **actor_normalization** | Normalize actor names, aliases, ransomware brands, and campaigns. |
| **source_reliability** | Estimate source reliability and information credibility. |
| **report_drafting** | Draft structured victim, CERT, provider, MISP, or public reports. |
| **public_publishing** | Produce sanitized public intelligence after mitigation. |
Do not train examples for:
- exploitation steps
- credential abuse
- phishing construction
- malware deployment
- ransomware operations
- evasion
- stealth
- persistence
- unauthorized forum access
- instructions for obtaining stolen data
---
## 6. Recommended LoRA Strategy
Do not start by training one large mixed LoRA. Start with small task-specific adapters.
Recommended adapter order:
| Priority | Adapter | Reason |
|---:|---|---|
| 1 | **lora-router** | Central to the project and easier to evaluate objectively. |
| 2 | **lora-ioc-extractor** | High utility, clear labels, measurable precision and recall. |
| 3 | **lora-evidence-handler** | Helps enforce safe handling decisions. |
| 4 | **lora-report-writer** | Drafts structured notifications after reviewed facts exist. |
| 5 | **lora-actor-normalizer** | Improves actor and campaign mapping. |
| 6 | **lora-public-publisher** | Produces public-safe summaries after mitigation. |
Training should begin only after enough reviewed examples exist:
- 1,000+ reviewed examples for a single narrow task, or
- 3,00010,000 mixed examples across several tasks.
Until then, use rules, retrieval, embeddings, and human-reviewed prompts.
---
## 7. JSONL Training Format
Each JSONL line should contain one training example.
Standard structure:
```json
{
"task": "routing_decision",
"instruction": "Given a defensive cyber-intelligence signal, choose the correct reporting destinations and order.",
"input": {},
"output": {},
"metadata": {
"source_type": "public_advisory | vendor_report | synthetic | internal_approved",
"tlp": "CLEAR | GREEN | AMBER",
"license": "approved",
"reviewed": true,
"policy_version": "v1",
"dataset_version": "dataset-router-v0.1"
}
}
```
---
## 8. Example: IOC Extraction
```json
{
"task": "ioc_extraction",
"instruction": "Extract defensive indicators from the cyber threat report. Return JSON only.",
"input": "A phishing campaign used login-example[.]com and delivered payload hash 44d88612fea8a8f36de82e1278abb02f. The actor referenced CVE-2024-12345.",
"output": {
"domains": ["login-example.com"],
"hashes": ["44d88612fea8a8f36de82e1278abb02f"],
"cves": ["CVE-2024-12345"],
"ips": [],
"urls": []
},
"metadata": {
"source_type": "synthetic_or_public_report",
"tlp": "CLEAR",
"license": "approved",
"reviewed": true
}
}
```
---
## 9. Example: Routing Decision
```json
{
"task": "routing_decision",
"instruction": "Given a defensive cyber-intelligence signal, choose the correct reporting destinations and order.",
"input": {
"incident_type": "access_sale",
"victim_country": "DE",
"sector": "energy",
"critical_infrastructure": true,
"confidence": "high",
"tlp": "AMBER"
},
"output": {
"severity": "critical",
"routes": [
"CERT-Bund",
"victim_security_team",
"sector_isac",
"law_enforcement_cyber_unit",
"misp_trusted_community"
],
"evidence_handling": "authority_sealed_package"
},
"metadata": {
"reviewed": true,
"policy_version": "v1"
}
}
```
---
## 10. Example: Evidence Handling
```json
{
"task": "evidence_handling",
"instruction": "Decide how evidence should be handled before external submission.",
"input": {
"evidence_type": "stolen_credentials",
"destination": "public_abuse_api",
"contains_pii": true,
"tlp": "RED"
},
"output": {
"submit_raw": false,
"handling": "do_not_send_raw_to_public_api",
"allowed_payload": "metadata_only",
"sealed_package_required": true,
"authorized_recipients": ["victim_security_team", "national_cert"]
},
"metadata": {
"reviewed": true
}
}
```
---
## 11. Dataset Metadata
Every example should include metadata.
| Field | Purpose |
|---|---|
| `task` | Training task category. |
| `source_type` | Origin category of the example. |
| `source_id` | Internal reference to source document. |
| `license` | Approved, restricted, unknown, or rejected. |
| `tlp` | CLEAR, GREEN, AMBER, or RED. |
| `reviewed` | Human approval status. |
| `reviewer_id` | Internal reviewer identity or role ID. |
| `policy_version` | Version of handling policy used. |
| `dataset_version` | Versioned dataset name. |
| `safety_flags` | Unsafe content or sensitive material flags. |
| `dedupe_hash` | Used to prevent duplicate examples. |
---
## 12. QualityGate Rules
QualityGate must reject examples that contain:
- raw credentials
- raw stolen data
- private victim information
- live access details
- exploit chains
- malware deployment steps
- phishing instructions
- evasion or persistence guidance
- incompatible license
- unknown provenance
- duplicated content
- unreviewed TLP:RED or confidential content
QualityGate should flag for human review when:
- source license is ambiguous
- actor attribution is uncertain
- victim identity is named
- sample contains personal data
- output teaches operationally sensitive details
- example conflicts with policy
---
## 13. Dataset Builder UI Requirements
IntelMiner should be visible in the Blue48 Operations Cockpit.
Screens:
| Screen | Purpose |
|---|---|
| **Dataset Sources** | Manage approved sources, license status, and collection schedules. |
| **Training Candidate Queue** | Review generated examples before approval. |
| **Example Review** | Edit, approve, reject, or mark examples unsafe. |
| **Dataset Builder** | Export versioned JSONL datasets with train/validation split. |
| **Dataset Audit** | Track source, reviewer, license, and policy version. |
Candidate fields:
| Field | Meaning |
|---|---|
| Task | IOC extraction, routing, classification, etc. |
| Source | advisory, blog, report, synthetic, internal. |
| License | approved, restricted, unknown, rejected. |
| Quality score | Estimated usefulness. |
| Safety flag | safe, needs review, reject. |
| Reviewer status | pending, approved, rejected. |
---
## 14. Dataset Versioning
Datasets should be versioned clearly:
```text
dataset-router-v0.1
dataset-ioc-extractor-v0.3
dataset-evidence-handler-v0.2
dataset-report-writer-v0.2
```
Each export should include:
- dataset name
- version
- date
- number of examples
- task distribution
- source distribution
- license distribution
- reviewer count
- rejected example count
- train/validation split
- policy version
---
## 15. Human Review Requirements
Human approval is required before examples become training data.
Reviewers should check:
- factual correctness
- source license
- safety boundaries
- absence of raw sensitive data
- correct label
- useful expected output
- no attacker-enabling content
Two-person review is recommended for:
- internal case-derived examples
- sensitive incident examples
- actor attribution examples
- routing examples involving law enforcement or critical infrastructure
- examples derived from TLP:AMBER material
TLP:RED material should not be used for LoRA training unless an explicit legal, operational, and governance policy exists.
---
## 16. Summary
IntelMiner is the bridge between Blue48 operations and future specialized defensive models.
It should collect only lawful and approved data, check license and safety constraints, build structured examples, require human review, and export versioned JSONL datasets. The first LoRA should likely be `lora-router`, followed by `lora-ioc-extractor` and `lora-evidence-handler`.

Binary file not shown.

After

Width:  |  Height:  |  Size: 979 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 980 KiB

518
docs/archive/routeline.md Normal file
View File

@@ -0,0 +1,518 @@
# Blue48 Reporting and API Escalation Architecture v2
**Document type:** Project record / operational architecture
**Scope:** API-eligible reporting platforms, routing order, evidence handling, CERT mapping, abuse routing, receipts, and audit controls
**Status:** Draft v2
---
## 1. Purpose
This document defines how Blue48 routes defensive cyber-intelligence to the correct recipients using structured APIs, trusted communities, CERT/CSIRT channels, abuse-reporting endpoints, and authority-sealed evidence packages.
The platform is designed for lawful white-hat operations. It should not amplify stolen data, expose victims prematurely, or interact with criminal actors.
Core principle:
> Validate the signal, protect the evidence, route only what each destination is authorized to receive, and prove every external action through an immutable ledger.
---
## 2. Recommended Reporting Order
### Normal cases
```text
Victim Security Team
→ National CERT / CSIRT
→ Sector ISAC / trusted community
→ Law enforcement cyber unit when criminal evidence exists
→ Provider / registrar / abuse APIs
→ Public sanitized report after mitigation
```
### Imminent harm or critical infrastructure
```text
National CERT / CSIRT
→ Victim Security Team
→ Law enforcement cyber unit
→ Sector ISAC / regulator
→ Provider / registrar / abuse APIs
→ Trusted CTI community
→ Public sanitized report after mitigation or clearance
```
### Malicious infrastructure
```text
Hosting provider / CDN / cloud abuse desk
→ Registrar abuse contact
→ Registry escalation
→ National CERT / CSIRT
→ Law enforcement when warranted
→ Trusted CTI community
```
### Mass exploitation
```text
Affected vendor
→ National CERT / CSIRT
→ Affected sectors / ISACs
→ MISP / trusted CTI community
→ Public advisory after coordinated mitigation
```
---
## 3. Authority-Sealed Evidence Handling
Blue48 does not treat full evidence protection as “sanitization.” Sensitive evidence should be preserved, encrypted, and routed only to authorized recipients.
Use the term:
> **Authority-Sealed Evidence Handling**
Purpose:
- preserve high-value evidence
- prevent uncontrolled internal access
- prevent accidental redistribution
- allow victims or authorities to decrypt when authorized
- destroy local plaintext and unwrapped keys after successful sealing
---
## 4. Evidence Protection Models
### Model A: Authority public-key encryption
Authorities or victims provide public encryption keys.
```text
Evidence collected
→ sensitive evidence packaged
→ package encrypted with authority/victim public key
→ encrypted package submitted
→ local plaintext destroyed
→ only recipient can decrypt
```
This is the cleanest model because Blue48 never holds the recipient private key.
### Model B: One-time evidence key wrapped for recipients
```text
Generate random evidence key
→ encrypt evidence with evidence key
→ wrap evidence key to recipient public keys
→ submit encrypted package
→ destroy local plaintext
→ destroy local unwrapped evidence key
```
Package example:
```json
{
"evidence_package_id": "uuid",
"encrypted_evidence": "ciphertext-reference",
"wrapped_keys": [
{
"recipient": "CERT-Bund",
"key_id": "authority-key-id",
"wrapped_key": "encrypted-evidence-key"
},
{
"recipient": "Victim Security Team",
"key_id": "victim-key-id",
"wrapped_key": "encrypted-evidence-key"
}
],
"metadata": {
"tlp": "AMBER",
"severity": "critical",
"created_at": "2026-05-13T00:00:00Z",
"retention_policy": "plaintext destroyed after encryption"
}
}
```
---
## 5. Destination Minimization
Authority-sealed evidence handling does not mean every platform receives full evidence. Public and semi-public APIs should receive only the minimum necessary payload.
| Destination type | Payload |
|---|---|
| CERT / law enforcement | Encrypted full evidence package when authorized. |
| Victim security team | Encrypted full or partial evidence package. |
| Trusted MISP community | TLP-filtered STIX indicators and context. |
| Provider / registrar abuse API | Minimal abuse report with infrastructure evidence. |
| URLhaus / MalwareBazaar | Malware URL/hash/sample only when legally allowed. |
| AbuseIPDB | IP, category, timestamp, short comment. |
| VirusTotal | Hash, URL, or sample only when policy allows. |
| Public report | Sanitized narrative, no raw sensitive evidence. |
---
## 6. Priority Platform Order
For European or global operations, use MISP and national CERT routing before US-specific AIS.
| Priority | Platform / route | Role |
|---:|---|---|
| 1 | **MISP / CIRCL / trusted MISP communities** | Primary CTI sharing backbone. |
| 2 | **National CERT / CSIRT** | Country-specific authority route. |
| 3 | **CERT-EU / ENISA CSIRTs Network** | EU institutions and European coordination. |
| 4 | **CISA AIS** | US-relevant machine-to-machine indicator sharing. |
| 5 | **OpenCTI** | Internal graph and knowledge base, not necessarily an external reporting destination. |
| 6 | **Provider / CDN / cloud abuse APIs** | Infrastructure mitigation and takedown. |
| 7 | **Registrar / registry abuse channels** | Domain suspension or escalation. |
| 8 | **abuse.ch, URLhaus, MalwareBazaar** | Malware URL/sample ecosystem reporting. |
| 9 | **AbuseIPDB / Spamhaus / PhishTank / urlscan.io** | Public/semi-public abuse and phishing ecosystem reporting. |
| 10 | **Public advisory channels** | Sanitized reporting after mitigation. |
---
## 7. CERT / CSIRT Routing Map
The routing engine should pick receivers based on victim country, sector, and legal jurisdiction.
| Country / region | Receiver | Channel type |
|---|---|---|
| Germany | BSI / CERT-Bund | Structured email, trusted channels, MISP community where available. |
| France | ANSSI / CERT-FR | CERT channel, structured reporting, TAXII/MISP where available. |
| United Kingdom | NCSC-UK | Structured reporting, early-warning services, official channels. |
| Netherlands | NCSC-NL | CERT channel, trusted community/MISP where available. |
| Spain | CCN-CERT / INCIBE-CERT | Public-sector/private-sector split, CERT channels, MISP where available. |
| EU institutions | CERT-EU | EU institutional route. |
| EU criminal coordination | Europol EC3 | Usually via national CERT/law-enforcement channels, not first-call technical sharing. |
| United States | CISA / FBI IC3 / FBI field office | CISA for technical reporting, IC3/FBI for crime reporting. |
Implementation note:
> Europol EC3 should not be treated as the first technical receiver. Route through the relevant national CERT or law-enforcement channel first unless a formal coordination channel exists.
---
## 8. Severity and Class Mapping
Blue48 may keep an internal class model, but outbound reports should include standard severity.
| Internal class | Meaning | External severity |
|---|---|---|
| A | Imminent harm / attack likely underway | Critical |
| B | Credible planned attack | High |
| C | Confirmed exposure | High / Medium |
| D | Campaign intelligence | Medium / High |
| E | Weak signal / watchlist | Low / Monitor |
---
## 9. Normalized Case Object
All workers should read and write the same normalized case object.
```json
{
"case_id": "B48-2026-000001",
"summary": "Short defensive summary",
"classification": {
"class": "A | B | C | D | E",
"severity": "low | medium | high | critical",
"tlp": "RED | AMBER | GREEN | CLEAR",
"incident_type": "access_sale | ransomware | credential_leak | phishing | malware | exploit | botnet | data_leak"
},
"confidence": {
"level": "low | medium | high",
"source_reliability": "A | B | C | D | E | F | unknown",
"information_credibility": "1 | 2 | 3 | 4 | 5 | 6 | unknown"
},
"victim": {
"name": "",
"domain": "",
"country": "",
"sector": "",
"critical_infrastructure": false
},
"actor": {
"name": "",
"aliases": [],
"campaign": "",
"confidence": "low | medium | high"
},
"observables": {
"domains": [],
"ips": [],
"urls": [],
"hashes": [],
"cves": [],
"wallets": [],
"emails": []
},
"evidence": {
"raw_evidence_location": "internal-only-reference",
"sealed_package_id": "",
"payload_hash": "",
"plaintext_destroyed": false,
"local_unwrapped_key_destroyed": false
},
"routing": {
"recommended_routes": [],
"blocked_routes": [],
"human_approval_required": true
}
}
```
---
## 10. TLP Enforcement
Every destination must define a maximum allowed TLP.
Routing precondition:
```text
submission.tlp <= destination.max_tlp_allowed
```
Example destination policy:
| Destination | Max TLP | Payload type |
|---|---|---|
| CERT / CSIRT trusted route | RED or AMBER depending channel | Sealed evidence package. |
| Victim security team | RED or AMBER depending identity verification | Sealed package or controlled extract. |
| MISP trusted community | AMBER or GREEN depending sharing group | STIX/MISP event. |
| Public MISP community | GREEN or CLEAR | Public-safe indicators. |
| AbuseIPDB | CLEAR | Minimal IP abuse report. |
| URLhaus | CLEAR / GREEN depending policy | Malicious URL report. |
| VirusTotal | CLEAR only unless legal approval exists | Hash/URL/sample where permitted. |
| Public advisory | CLEAR | Sanitized intelligence. |
---
## 11. API-Eligible Destination Categories
### 11.1 CTI sharing
| Platform | Purpose | Integration style |
|---|---|---|
| MISP | Threat intelligence sharing and communities. | REST API / PyMISP / STIX. |
| OpenCTI | Internal CTI graph and knowledge management. | GraphQL API / STIX. |
| CISA AIS | US-relevant automated indicator sharing. | TAXII/STIX-style exchange. |
### 11.2 Abuse and takedown
| Platform | Purpose | Integration style |
|---|---|---|
| Cloudflare Abuse Reports | Report abuse behind Cloudflare services. | Abuse Reports API / portal. |
| Registrar abuse channels | Domain abuse escalation. | API where available, otherwise structured email/WHOIS abuse contact. |
| Registry escalation | Escalation for TLD-level issues. | Registry-specific process. |
| URLhaus | Malware URL reporting. | API submission. |
| MalwareBazaar | Malware sample/hash ecosystem. | API submission/query. |
| AbuseIPDB | IP reputation and abuse reports. | API. |
| Spamhaus | Spam, botnet, and malicious infrastructure reporting. | Submission portal/API where available. |
| PhishTank | Phishing URL reporting. | API/community workflow. |
| urlscan.io | URL scan and malicious page evidence. | API submission. |
| Google Web Risk | Unsafe URL submission where permitted. | Restricted API. |
| VirusTotal | URL/file/hash enrichment and submission. | API, policy controlled. |
| Netcraft | Phishing and abuse reporting. | API/enterprise options and reporting channels. |
### 11.3 Internal case-management
| Platform | Purpose |
|---|---|
| TheHive | Security case management and observables. |
| DFIR-IRIS | Incident response case management. |
| ServiceNow SIR | Enterprise incident response workflow. |
| Jira Service Management | Case routing and task management. |
---
## 12. Registrar and Registry Abuse Flow
For malicious domains, phishing portals, C2 domains, impersonation infrastructure, and ransomware-related web infrastructure:
```text
Identify domain
→ identify hosting provider and CDN/proxy
→ identify registrar from WHOIS/RDAP
→ report to hosting/CDN abuse
→ report to registrar abuse
→ escalate to registry if registrar fails or emergency applies
→ notify CERT if critical or cross-border
```
Payload should include:
- domain
- abuse type
- timestamp
- evidence hash
- screenshot hash or sealed evidence reference
- safe reproduction summary
- victim impersonated, if relevant
- requested action
Avoid sending raw credentials, stolen data, or private victim details to registrar/registry channels unless legally justified.
---
## 13. Rate Limits and Queueing
Every destination should define a quota object.
```json
{
"destination": "AbuseIPDB",
"quota": {
"limit": 1000,
"period": "day",
"priority_policy": "critical_first",
"backoff": "exponential"
}
}
```
RateLimiter responsibilities:
- prevent dropped submissions
- queue low-priority submissions during bursts
- reserve budget for critical cases
- retry transient failures
- record rate-limit errors in Ledger
---
## 14. Receipt and Effectiveness Tracking
ReceiptCollector and Watcher should capture feedback from every destination.
Receipt schema:
```json
{
"case_id": "B48-2026-000001",
"destination": "Cloudflare Abuse API",
"submitted_at": "2026-05-13T00:00:00Z",
"acknowledged_at": "2026-05-13T00:05:00Z",
"receipt_id": "external-case-id",
"status": "submitted | acknowledged | rejected | actioned | closed",
"outcome": "pending | takedown_confirmed | duplicate | no_action | escalated"
}
```
Success metrics:
| Destination type | Success metric |
|---|---|
| CERT / CSIRT | Acknowledgement, case opened, mitigation guidance issued. |
| Provider / registrar | Infrastructure suspended, blocked, or investigated. |
| MISP | Event accepted, sightings, correlations. |
| URLhaus / MalwareBazaar | URL/sample accepted, classified, distributed. |
| Public report | Defenders consume advisory, no sensitive data leak. |
---
## 15. Immutable Audit Log
Every external submission or destructive action must write an immutable record.
Audit row:
```text
(timestamp, case_id, destination, payload_hash, submitter_identity, tlp, response_id, outcome)
```
Also audit:
- evidence sealing
- recipient key addition/removal
- plaintext destruction
- local key destruction
- route approval
- route blocking
- public publication
- dataset export
- policy modification
---
## 16. Public Reporting Rules
Public reports may include:
- sector trend
- country/region if safe
- actor or campaign if already public or properly attributed
- TTPs
- CVEs
- defensive recommendations
- sanitized IOCs
- non-sensitive timeline
Public reports must not include:
- raw credentials
- stolen data
- direct links to stolen data
- live access details
- internal screenshots
- private victim communications
- exact criminal-source links
- exploit instructions
- anything that increases victim harm before mitigation
---
## 17. Initial Adapter Build Order
Recommended Block G adapter priority:
1. MISP via PyMISP
2. AbuseIPDB
3. URLhaus
4. Cloudflare Abuse Reports
5. urlscan.io
6. MalwareBazaar
7. Registrar abuse structured email/RDAP helper
8. VirusTotal enrichment/submission with strict policy guard
9. OpenCTI internal graph integration
10. TheHive or DFIR-IRIS case export
These cover the most common evidence and routing cases while keeping legal risk manageable.
---
## 18. Secrets Handling
Every adapter needs credentials.
Rules:
- credentials live in `.env` or secret manager
- credentials are injected at runtime
- credentials are never baked into container images
- credentials are rotatable
- credentials are scoped per adapter
- every API call writes to the Ledger
- failed authentication events are logged and alerted
---
## 19. Summary
The v2 architecture changes the platform from a list of reporting sites into an operational routing system.
The most important revisions are:
- MISP and national CERTs are prioritized over CISA AIS for European/global work.
- CERT routing is country-specific.
- Registrar and registry abuse flows are included.
- Sensitive evidence is protected through authority-sealed encryption, not casual sanitization.
- Public and semi-public APIs receive minimized payloads only.
- TLP enforcement, rate limits, receipts, and immutable audit logs are mandatory.

426
docs/archive/waypoints.md Normal file
View File

@@ -0,0 +1,426 @@
# API-Eligible Cyber Threat Reporting & Escalation Platforms
**Project purpose:** Build a white-hat defensive reporting workflow that can push credible pre-incident or incident intelligence to the right receivers through APIs or structured machine-to-machine channels.
**Scope:** This document focuses on platforms that support API-based reporting, submission, alert ingestion, or structured intelligence sharing. It excludes direct interaction with criminal forums and excludes sources that only provide manual web forms unless they are still operationally important as a fallback.
**Last reviewed:** 2026-05-13
---
## 1. Recommended Reporting Order
### 1.1 Normal credible threat against a named organization
1. **Victim security contact / VDP / security.txt**
2. **National CERT / CSIRT**
3. **Sector ISAC / ISAO**
4. **Law enforcement cyber unit**
5. **Infrastructure provider abuse API**
6. **Threat-intelligence sharing platform**
7. **Sanitized public advisory**
### 1.2 Imminent harm or critical infrastructure
1. **National CERT / CSIRT**
2. **Victim security team**
3. **Law enforcement cyber unit**
4. **Sector regulator / ISAC**
5. **Infrastructure provider abuse API**
6. **Trusted CTI community**
7. **Public advisory only after mitigation or authority clearance**
### 1.3 Malicious infrastructure, phishing, malware, or botnet indicators
1. **Platform-specific reporting API**
Examples: Cloudflare Abuse Reports API, Spamhaus Submission API, AbuseIPDB, URLhaus, MalwareBazaar, PhishTank, urlscan.io, Google Web Risk, VirusTotal.
2. **CERT / CSIRT**
3. **Affected victim**
4. **MISP / OpenCTI / trusted CTI sharing**
5. **Public sanitized report**
---
## 2. Tier-1 API Reporting Platforms
These are the strongest fits for automated defensive reporting because they accept machine-readable submissions or support structured threat sharing.
| Priority | Platform | Best for | API / submission capability | Use in workflow |
|---:|---|---|---|---|
| 1 | **CISA Automated Indicator Sharing (AIS)** | Sharing cyber threat indicators with U.S. government and AIS participants | STIX/TAXII bidirectional indicator sharing | High-confidence indicators, especially campaigns, exploited infrastructure, malware IOCs |
| 2 | **MISP** | Community and private threat-intelligence sharing | REST API, PyMISP, event and attribute creation | Share vetted IOCs, TTPs, victim-agnostic campaign intelligence |
| 3 | **OpenCTI** | Internal or consortium CTI knowledge base | GraphQL API and connectors | Normalize, enrich, and route intelligence before external disclosure |
| 4 | **Cloudflare Abuse Reports API** | Abuse hosted behind or involving Cloudflare | API supports submitting abuse reports, viewing report details, and listing reports | Phishing, malware, abusive hosting, malicious domains using Cloudflare services |
| 5 | **Spamhaus Submission Portal API** | Malicious IPs, domains, URLs, suspicious email content | REST API for suspicious IP/domain/URL/email reports | Reputation/blocklist contribution and takedown-support evidence |
| 6 | **AbuseIPDB** | Malicious IP reputation | API for reporting and checking abusive IP addresses | Scanner, brute-force, spam, probing, attack-source IP reporting |
| 7 | **URLhaus / abuse.ch** | Malware distribution URLs | Community API for downloading and submitting malware URLs | Active malware URL reporting and malware-distribution tracking |
| 8 | **MalwareBazaar / abuse.ch** | Malware sample exchange | Community API for sample upload/download and bulk queries | Malware sample submission and hash enrichment |
| 9 | **PhishTank** | Phishing URL verification | API for phishing URL status checks; community submission workflow | Phishing verification and enrichment |
| 10 | **urlscan.io** | URL detonation, phishing/malware page evidence | Submission API to scan URLs and retrieve results | Safe screenshot/evidence generation, IOC enrichment |
| 11 | **Google Web Risk Submission API** | Unsafe URL submission to Google Safe Browsing ecosystem | Submission API for suspected unsafe URLs; access requires sales/customer-engineer approval | High-scale malicious URL reporting |
| 12 | **VirusTotal API** | File, URL, domain, IP enrichment and submission | API for file upload, URL scan, reports, and comments | Enrichment and submission to multi-vendor analysis ecosystem |
| 13 | **Netcraft Report API** | Phishing, malware, suspicious URLs, emails, files | API for automated threat reporting | Brand abuse, phishing, takedown-support reporting |
---
## 3. Platform Notes
### 3.1 CISA Automated Indicator Sharing (AIS)
**Type:** Government-backed indicator sharing
**Best for:** High-confidence cyber threat indicators and defensive measures
**API style:** STIX/TAXII
**Good submissions:** IPs, domains, URLs, hashes, malware indicators, campaign indicators
**Avoid:** Victim-identifying details unless necessary and authorized
**Operational fit:** Use for campaign-level and infrastructure-level reporting, especially when the intelligence may protect multiple organizations.
Source: https://www.cisa.gov/how-automated-indicator-sharing-ais-works
---
### 3.2 MISP
**Type:** Open-source threat-intelligence sharing platform
**Best for:** Structured CTI sharing inside trusted communities
**API style:** REST API; PyMISP client
**Good submissions:** Events, attributes, galaxies, taxonomies, TLP-tagged indicators, sightings
**Avoid:** Raw stolen data, credentials, or victim-sensitive artifacts without permission
**Operational fit:** Use as the main trusted-community sharing layer.
Sources:
- https://www.misp-project.org/openapi/
- https://www.circl.lu/doc/misp/automation/
---
### 3.3 OpenCTI
**Type:** Threat-intelligence platform / knowledge graph
**Best for:** Internal CTI normalization, enrichment, and case-to-intel routing
**API style:** GraphQL API
**Good submissions:** STIX-like entities, observables, reports, relationships, indicators, malware, threat actors
**Avoid:** Treating OpenCTI itself as the final external reporting destination unless connected to a sharing community
**Operational fit:** Use as your central intelligence brain before pushing to MISP, CERTs, providers, or reports.
Sources:
- https://docs.opencti.io/latest/
- https://docs.opencti.io/latest/reference/api/
---
### 3.4 Cloudflare Abuse Reports API
**Type:** Infrastructure provider abuse reporting
**Best for:** Phishing, malware, abuse involving Cloudflare-protected assets
**API style:** REST API
**Good submissions:** URLs, domains, abuse category, evidence, contact details
**Avoid:** Large stolen datasets; provide proof and context instead
**Operational fit:** Use whenever malicious infrastructure resolves through or is protected by Cloudflare.
Sources:
- https://developers.cloudflare.com/api/resources/abuse_reports/
- https://developers.cloudflare.com/fundamentals/reference/report-abuse/submit-report/
---
### 3.5 Spamhaus Submission Portal API
**Type:** Reputation and abuse-intelligence reporting
**Best for:** Malicious IPs, domains, URLs, suspicious email content
**API style:** REST API
**Good submissions:** IPs, domains, URLs, suspicious raw email/source evidence
**Avoid:** Unverified mass submissions; maintain high-confidence standards
**Operational fit:** Use for reliable contribution to reputation systems and anti-abuse communities.
Sources:
- https://submit.spamhaus.org/api/
- https://www.spamhaus.org/resource-hub/threat-intelligence/how-to-report-suspicious-activity-to-spamhaus/
---
### 3.6 AbuseIPDB
**Type:** IP reputation and abuse reporting
**Best for:** Attack-source IP reporting
**API style:** REST API
**Good submissions:** Brute force, scanning, spam, exploitation attempts, abusive traffic categories
**Avoid:** Reporting shared NAT/VPN/cloud IPs without strong evidence
**Operational fit:** Use as an automated destination for source-IP abuse reports, especially from honeypots, firewalls, and SIEM detections.
Sources:
- https://docs.abuseipdb.com/
- https://www.abuseipdb.com/api.html
---
### 3.7 URLhaus / abuse.ch
**Type:** Malware URL exchange
**Best for:** Active malware distribution URLs
**API style:** Community API with Auth-Key
**Good submissions:** URLs directly serving malware payloads
**Avoid:** Generic phishing pages that do not distribute malware
**Operational fit:** Use when you can verify a URL is actively distributing malware.
Source: https://urlhaus.abuse.ch/api/
---
### 3.8 MalwareBazaar / abuse.ch
**Type:** Malware sample exchange
**Best for:** Malware samples, hashes, family tracking
**API style:** Community API with Auth-Key
**Good submissions:** Malware samples and related metadata
**Avoid:** Benign files, sensitive internal documents, or samples that cannot be legally shared
**Operational fit:** Use after malware handling review, with strict legal and operational controls.
Source: https://bazaar.abuse.ch/api/
---
### 3.9 PhishTank
**Type:** Community phishing clearing house
**Best for:** Phishing URL verification and community validation
**API style:** HTTP POST lookup API
**Good submissions:** Suspected phishing URLs
**Avoid:** URLs containing victim credentials, tokens, or private data in query strings
**Operational fit:** Use for phishing intelligence enrichment and community verification.
Sources:
- https://phishtank.net/api_info.php
- https://www.phishtank.com/
---
### 3.10 urlscan.io
**Type:** URL scanning and investigation platform
**Best for:** URL detonation, phishing evidence, page screenshots, redirects, IP/domain enrichment
**API style:** Submission API and search API
**Good submissions:** Suspicious URLs, phishing pages, malicious landing pages
**Avoid:** Private internal URLs or sensitive tokens; set scan visibility carefully
**Operational fit:** Use before provider reporting to create structured, shareable evidence.
Sources:
- https://urlscan.io/docs/api/
- https://docs.urlscan.io/pages/api-intro
---
### 3.11 Google Web Risk Submission API
**Type:** Unsafe URL submission into Googles protection ecosystem
**Best for:** High-scale phishing/malware URL submissions
**API style:** Submission API; restricted access
**Good submissions:** Suspected unsafe URLs that should be evaluated for Safe Browsing protection
**Avoid:** Assuming access is automatic; Google says access requires contacting sales or a customer engineer
**Operational fit:** Use when your group has enough volume and quality control to justify access.
Source: https://docs.cloud.google.com/web-risk/docs/submission-api
---
### 3.12 VirusTotal API
**Type:** Multi-vendor malware and URL analysis ecosystem
**Best for:** URL/file submission, enrichment, analysis reports, community comments
**API style:** REST API
**Good submissions:** Suspicious files, URLs, domains, IPs, hashes
**Avoid:** Uploading confidential files, customer data, private source code, or stolen materials
**Operational fit:** Use for enrichment and multi-vendor visibility. Use private scanning options if available and appropriate.
Sources:
- https://docs.virustotal.com/docs/api-overview
- https://docs.virustotal.com/reference/overview
- https://docs.virustotal.com/reference/files-scan
---
### 3.13 Netcraft Report API
**Type:** Phishing, malware, suspicious URL/email/file reporting
**Best for:** Phishing and takedown-support reporting
**API style:** Report API
**Good submissions:** Malicious URLs, suspicious emails, files, phishing evidence
**Avoid:** Low-confidence or privacy-sensitive submissions
**Operational fit:** Use for high-confidence phishing and brand-abuse reporting, especially where takedown support matters.
Sources:
- https://report.netcraft.com/api
- https://www.netcraft.com/company/news/netcraft-launches-new-real-time-threat-api-to-improve-and-accelerate-collaboration-with-infrastructure-providers
---
## 4. Internal Case / Incident Routing Platforms
These platforms are not external public reporting destinations, but they are useful for receiving your detections through APIs and routing them into a proper case workflow.
| Platform | Best for | API capability | Workflow role |
|---|---|---|---|
| **TheHive** | SOC alert-to-case management | TheHive 5 API supports alert creation | Convert signals into triaged investigations |
| **DFIR-IRIS** | Collaborative incident response | Alerts API and general API key support | Internal IR case management |
| **ServiceNow SIR** | Enterprise security incident response | REST API to write to Security Incident Import table | Enterprise escalation and tracking |
| **Jira Service Management Incidents** | Incident workflow automation | Incidents REST API | Lightweight or engineering-driven incident coordination |
Sources:
- https://docs.strangebee.com/thehive/api-docs
- https://docs.dfir-iris.org/operations/alerts/
- https://www.servicenow.com/docs/r/yokohama/security-management/security-incident-response/c_3rdPartyAlertMonToolInteg.html
- https://developer.atlassian.com/cloud/incidents/rest/api-group-incident/
---
## 5. Platforms Useful for Monitoring but Not Primary API Reporting Destinations
| Platform | API status | Recommendation |
|---|---|---|
| **Ransomware.live** | API available for ransomware victim/group intelligence | Use for monitoring and enrichment, not as the main reporting destination |
| **Shadowserver** | RESTful API for report data access; no STIX/TAXII currently | Use for inbound network exposure/threat reports and enrichment |
| **Have I Been Pwned** | API for breach account lookups | Use for exposure checks, not submitting new breach reports unless separately arranged |
| **OpenPhish** | No public lookup API; offers feed/module model | Use feed or email/manual reporting fallback |
| **Microsoft Defender submissions** | Portal and Microsoft Graph threat-submission resources for some Defender scenarios | Use when operating within a Microsoft tenant or Defender workflow |
Sources:
- https://www.ransomware.live/api
- https://api-pro.ransomware.live/
- https://www.shadowserver.org/faq/can-i-access-your-reports-through-an-api/
- https://haveibeenpwned.com/api/v3
- https://openphish.com/kb.html
- https://learn.microsoft.com/en-us/graph/api/resources/security-filethreatsubmission
---
## 6. Practical Routing Matrix
| Evidence type | First API destination | Second destination | Internal system |
|---|---|---|---|
| Malicious IP scanning/brute force | AbuseIPDB | Spamhaus if relevant | TheHive / DFIR-IRIS |
| Malware distribution URL | URLhaus | Google Web Risk / VirusTotal / urlscan.io | MISP / OpenCTI |
| Malware sample | MalwareBazaar | VirusTotal | TheHive / OpenCTI |
| Phishing URL | PhishTank / Netcraft / urlscan.io | Google Web Risk / Cloudflare if hosted/proxied there | MISP / TheHive |
| Cloudflare-proxied abuse | Cloudflare Abuse Reports API | Netcraft / PhishTank if phishing | Internal case platform |
| Suspicious email infrastructure | Spamhaus Submission API | AbuseIPDB for IPs | MISP / OpenCTI |
| Campaign-level indicators | CISA AIS / MISP | CERT/CSIRT | OpenCTI |
| Ransomware victim claim | Victim + CERT first | MISP only sanitized indicators | OpenCTI / TheHive |
| Leaked credentials/API keys | Victim first | CERT if severe | Internal IR case only |
| Critical infrastructure threat | CERT/CSIRT first | Victim + law enforcement | Internal restricted case |
---
## 7. Minimum Viable API Stack
For a new white-hat group, start with:
1. **MISP** — trusted sharing and structured IOC exchange.
2. **OpenCTI** — central intelligence normalization and knowledge graph.
3. **TheHive or DFIR-IRIS** — case management and triage.
4. **AbuseIPDB** — automated IP abuse reporting.
5. **URLhaus** — malware URL submission.
6. **MalwareBazaar** — malware sample submission, only with legal controls.
7. **urlscan.io** — URL evidence generation.
8. **Cloudflare Abuse Reports API** — infrastructure abuse reports.
9. **Spamhaus Submission Portal API** — IP/domain/URL/email reputation reporting.
10. **CISA AIS or national CERT sharing route** — campaign-level indicator sharing.
---
## 8. Data Handling Rules
### Never submit publicly
- Raw credentials
- API keys or session cookies
- Stolen databases
- Internal screenshots that identify victims without consent
- Exploit instructions
- Live access details
- Private source code
- Sensitive personal data
### Safe to submit when verified
- IP addresses
- Domains
- URLs, if they do not contain tokens or PII
- File hashes
- Malware samples, only where legally allowed
- Timestamps
- Actor handles
- Campaign labels
- CVEs
- MITRE ATT&CK techniques
- Sanitized screenshots
- Provider-neutral technical context
---
## 9. Recommended Submission Object
Use this normalized object internally before transforming to each API schema.
```json
{
"case_id": "WG-2026-000001",
"tlp": "AMBER",
"severity": "A|B|C|D|E",
"confidence": "low|medium|high",
"threat_type": "phishing|malware|ransomware|credential_exposure|iab|botnet|vulnerability_exploitation",
"victim": {
"organization": "",
"domain": "",
"country": "",
"sector": ""
},
"source": {
"category": "forum|leak_site|telegram|honeypot|sensor|osint|tip",
"first_seen": "",
"last_seen": "",
"collection_method": "lawful_osint_or_partner_feed"
},
"observables": {
"ips": [],
"domains": [],
"urls": [],
"hashes": [],
"emails": [],
"wallets": [],
"cves": []
},
"evidence": {
"summary": "",
"sanitized_screenshots": [],
"raw_evidence_location": "internal_restricted_storage"
},
"recommended_actions": [],
"routing": {
"primary_destinations": [],
"secondary_destinations": [],
"public_disclosure_allowed": false
}
}
```
---
## 10. Final Recommendation
The most practical API-driven architecture is:
**Sensors / CTI sources → OpenCTI → TheHive or DFIR-IRIS → routing engine → MISP + provider abuse APIs + CERT/AIS channels → sanitized public reporting**
This keeps the group legally safer, avoids amplifying criminal material, and creates a repeatable path from early warning to real defensive action.

View File

@@ -0,0 +1,221 @@
# Review — API-Eligible Cyber Threat Reporting & Escalation Platforms (Draft v1)
**Reviewer:** Claude (Opus 4.7, 1M context)
**Review date:** 2026-05-13
**Document reviewed:** `waypoints.md` (first draft)
**Verdict:** Strong bones. Tone-perfect for white-hat defensive work — machine-to-machine, no vigilante framing. Publishable as an internal whitepaper after the critical fixes below.
---
## 1. What's Already Solid
Don't change these — they're load-bearing and correct.
- **Section 1.1 vs 1.2 split** (normal vs imminent harm) — exactly the right hinge for routing decisions.
- **Section 8 (never-submit list)** — covers GDPR / exploitation amplification / credential leakage failure modes well.
- **Section 9 normalized object** — the right abstraction. Transform-to-target instead of N bespoke pipelines.
- **Section 10 architecture sentence** — the whole project on one line: *Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public.*
---
## 2. Critical Fixes (do these before this leaves draft)
### 2.1 Geography mismatch — CISA AIS at #1 is US-only
For European-focused work, **MISP via CIRCL.lu** (Luxembourg) or the **ENISA CSIRTs Network** is the workhorse. CISA AIS does not cover EU institutions.
**Action:** Swap priorities #1#2 (MISP first, AIS second). Add a row for **CERT-EU** specifically for European institutions.
### 2.2 National CERTs are referenced generically but never named
The doc says "National CERT/CSIRT" everywhere but never resolves it to an actionable receiver.
**Action:** Add a small table after Section 1:
| Country | Receiver | Channel |
|---------|--------------------------------|----------------------------------------|
| DE | BSI / CERT-Bund | reports@cert-bund.de, MISP community |
| FR | ANSSI / CERT-FR | TAXII feed |
| UK | NCSC-UK | structured email + early-warning service |
| NL | NCSC-NL | MISP |
| ES | CCN-CERT, INCIBE-CERT | MISP |
| EU | CERT-EU, Europol EC3 | TLP-tagged MISP |
The routing engine should pick the right one based on victim country.
> **Note on Europol EC3:** they handle *criminal cases*, not first-call technical sharing. Route through your national CERT first; EC3 receives via national channels for cross-border coordination.
### 2.3 Domain registrar abuse is missing from Section 1.3
Cloudflare is covered, but registrars (Namecheap, Tucows, GoDaddy, EURid for `.eu`, DENIC for `.de`) are often the faster takedown path.
**Action:** Add to the malicious-infrastructure flow:
*registrar abuse contact from WHOIS → registrar abuse API/email → registry as escalation.*
### 2.4 Severity scale `A|B|C|D|E` is unusual and undefined
Either define it inline or replace with the standard `low|medium|high|critical` (CVSS-style) or NIS2 severity categories for EU consistency. Receivers will normalize anyway — but defining it lets the routing engine make automatic decisions.
### 2.5 Normalized object missing an `actor` block
You have `victim` but no `actor`. Add:
```json
"actor": {
"name": "Adira",
"aliases": [],
"campaign": "",
"confidence": "A1|A2|B1|B2|C2|C3|D|E|F"
}
```
This field connects the doc to the project mission and lets the routing matrix differentiate actor-specific sightings from generic abuse reports.
(`A1``F` is the Admiralty Code, the de-facto CTI standard. If that's too much, fall back to `low|medium|high`.)
### 2.6 PII at submission time is a GDPR landmine
Section 9 has `observables.emails: []`. Submitting victim email addresses to AbuseIPDB or VirusTotal is a personal-data transfer under GDPR.
**Action:** Add a pre-submission sanitizer step that:
- Hashes / redacts emails to `local-part-hash@domain` when destination is public
- Strips PII from URLs (tokens, query params containing identifiers)
- Keeps raw originals only in `evidence.raw_evidence_location` (internal-only storage)
This belongs in the doc *before* the normalized-object section, not as an afterthought.
---
## 3. High-Value Additions
### 3.1 TLP enforcement at the routing layer
Nothing in the current schema *prevents* TLP:RED data being routed to a TLP:CLEAR destination.
**Action:** Add a routing precondition: `submission.tlp <= destination.max_tlp_allowed`.
- CISA AIS rejects TLP:RED
- Cloudflare doesn't care
- Spamhaus has its own rules
- MISP communities each have their own ceiling
Encode the ceiling per destination in the routing matrix.
### 3.2 STIX 2.1 as the serialization
Right now the doc implies *internal object → bespoke transform per API*. Cheaper and more standard:
**internal object → STIX 2.1 bundle → minor adapter per destination**
MISP, OpenCTI, CISA AIS, and most CTI tools are STIX-native. One serializer beats thirteen, and you get free interop with anything that already speaks STIX.
### 3.3 Rate-limit budgets
Many of these APIs have strict limits:
- AbuseIPDB free tier: 1000 reports/day
- VirusTotal public API: 4 req/min
- Spamhaus: per-submitter quotas
- Cloudflare: per-account rate limits
Without a token-bucket per destination, high-confidence submissions get silently dropped during bursts.
**Action:** Add a `destination_quota` field to the routing matrix and an enforcement layer.
### 3.4 Feedback loop is missing
When you submit to URLhaus, you can poll for status. When you submit to MISP, you get sightings. When you submit to Cloudflare, you get a case number. These should flow back into your OpenCTI graph as evidence-of-effectiveness.
Without this, you're operating open-loop — you don't know which destinations actually act on your reports.
**Action:** Add a Section 11 "Receipt and Effectiveness Tracking" that defines:
- Per-destination receipt schema (case ID, ack timestamp, outcome status)
- Polling cadence per destination
- A success metric per destination type (takedowns confirmed, sightings count, classification adopted)
### 3.5 NoMoreRansom (NMR)
Ransomware.live is listed under monitoring, but if a decryptor research effort produces anything, NMR is the destination.
**Action:** Add to the routing matrix:
| Evidence type | First API destination | Second destination | Internal system |
|-------------------------------|--------------------------------|----------------------|------------------------|
| Ransomware decryptor evidence | NoMoreRansom (private channel) | Victim CERT chain | OpenCTI internal only |
NMR coordinates so victims can decrypt before the adversary sees the fix — *never* publish a working decryptor publicly first.
---
## 4. Nice-to-Have
### 4.1 Submitter identity & signing
- Register a stable submitter handle with MISP / MalwareBazaar / AbuseIPDB — not a personal account.
- Sign internal objects with a project PGP key before they leave the system.
- CIRCL and other major MISP communities weight trust by submitter history.
### 4.2 Audit log requirement
Every external submission writes an immutable row:
```
(timestamp, destination, payload_hash, submitter_identity, tlp, response_id, outcome)
```
Legal cover, debugging, and the feedback loop in 3.4 all need this.
### 4.3 NIS2 callout for critical-infra reporting
EU NIS2 mandates incident reporting from regulated entities within 24h of awareness. If detections involve essential/important entity sectors, the routing engine should flag NIS2 obligation regardless of receiver choice.
### 4.4 Section ordering
Sections 8 (data handling) and 9 (normalized object) are foundations, not appendices. Move them up to Sections 34. Currently a reader hits the platform list before knowing what *not* to send.
### 4.5 Confidence convention
`low|medium|high` is fine, but production CTI commonly uses the **Admiralty Code** (`A1`, `B2`, etc., describing source reliability × information credibility) or estimative language. Mention the convention even if you don't fully adopt it.
---
## 5. Implementation Notes (Blue48 Hookup)
This doc is the spec for two components in the agent stack:
1. **`report_writer` agent** outputs Section 9's normalized object as its canonical format.
2. **A routing engine** (extension of `report_writer`, or a 7th agent) consumes that object, applies the matrix in Section 6, and fans out via API adapters.
Agents stop at *"produce the normalized object."* Human review reads it, decides "yes, ship this to MISP and Cloudflare," and clicks. The routing engine then runs the API calls, captures receipts, and feeds them back to OpenCTI.
### 5.1 Suggested initial adapters (Block G priority)
1. MISP (PyMISP)
2. AbuseIPDB
3. URLhaus
4. Cloudflare Abuse Reports
5. urlscan.io
These five cover ~80% of common evidence types in the routing matrix.
### 5.2 Secrets handling
Every adapter needs API credentials. They must:
- Live in `.env` (already excluded from image via `.dockerignore`)
- Be passed at container runtime via `env_file`, never baked into the image
- Be rotatable on a schedule (the audit log in 4.2 helps prove non-overlap)
---
## 6. Summary
| Category | Count | Notes |
|------------|------:|---------------------------------------------|
| Critical | 6 | Geography, CERT mapping, registrar abuse, severity scale, actor block, PII sanitizer |
| High-value | 5 | TLP enforcement, STIX 2.1, rate limits, feedback loop, NoMoreRansom |
| Nice-to-have | 5 | Signing, audit log, NIS2, ordering, Admiralty Code |
After the critical fixes, this is a publishable internal whitepaper and a clear spec for the routing engine. Good draft.

View File

@@ -0,0 +1,707 @@
# Detailed Review v2 — API-Eligible Cyber Threat Reporting & Escalation Platforms
**Reviewer:** Claude (Opus 4.7, 1M context)
**Review date:** 2026-05-13
**Document reviewed:** `waypoints.md` (first draft)
**Companion to:** `waypoints_firstpass.md` (v1 executive summary)
**Scope of this v2:** section-by-section findings, cross-cutting gaps, missing categories, revised schema, implementation priorities for blue48.
---
## 0. Method
I re-read the draft three times against the following lenses:
1. **Factual / API accuracy** — does each platform actually do what's claimed?
2. **Operational correctness** — would the routing actually work in practice, or break on first contact with reality?
3. **Legal / compliance** — GDPR, NIS2, MLAT, jurisdiction, chain of custody
4. **Threat-model coverage** — does this serve the actual project goal (campaign disruption, not individual attribution)?
5. **OPSEC of the reporter** — what does the adversary learn from each submission?
Findings below carry confidence tags: **[verified]**, **[likely current]**, **[verify before relying on]**.
---
## 1. Section-by-Section Findings
### 1.1 — Section 1: Recommended Reporting Order
**1.1.1 In Scenario 1.1 (normal credible threat), going to the victim first is correct in 90% of cases — but flag the exception.**
Insider-attack scenarios reverse this: notifying a victim org whose own admin/employee is the threat actor warns the attacker. For credential-leak cases involving privileged accounts, route CERT-first and let CERT decide whether to notify the victim org's leadership or its security contact. Add a `1.1.bis` for "victim contact may itself be compromised."
**1.1.2 Scenario 1.2 (imminent harm) is missing a specific decision point.**
If the imminent harm is to *critical infrastructure* (energy, water, healthcare, finance), in EU jurisdictions the **NIS2 Directive** mandates 24-hour reporting from regulated entities. Your routing engine should detect "victim sector ∈ NIS2 essential/important entity list" and either:
- Route the report so the victim can fulfill their NIS2 obligation, OR
- (If victim is unreachable) report directly via the relevant national CERT's NIS2 channel, which exists separately from generic CSIRT contact paths
**1.1.3 Scenario 1.3 missing receiver categories:**
- **Hosting providers** (not just CDNs). Cloudflare is a CDN; the actual origin server is somewhere else (Hetzner, OVH, AWS, DigitalOcean, etc.). A Cloudflare-only report leaves the origin running. Add hosting provider abuse as a parallel step, not after CDN.
- **Domain registrars** via WHOIS-extracted abuse contact, plus registry escalation for ccTLDs (DENIC for `.de`, AFNIC for `.fr`, EURid for `.eu`, Nominet for `.uk`)
- **Certificate authorities** for compromised cert revocation (Let's Encrypt revoke API for ACME-issued certs; commercial CA abuse contacts for the rest)
- **DNS providers** independent of registrar (Cloudflare DNS, Quad9, Google Public DNS abuse contacts — for blocking, not takedown)
**1.1.4 The implicit ordering bias.**
The draft optimizes for *legal-defensibility* (talk to the receiver who can act) but doesn't optimize for *operational speed-to-mitigation*. For phishing kits with active credential harvesting, the fastest mitigation is often: parallel-fan-out to (CDN, hosting, registrar, browser-block-list providers) simultaneously, then notify CERT as record-keeping. The doc reads as serial when in practice it should be parallel.
---
### 1.2 — Section 2: Tier-1 API Reporting Platforms
**1.2.1 Missing platforms that belong in Tier 1:**
| Platform | Why Tier-1 | API style |
|---|---|---|
| **abuse.ch ThreatFox** | IOC graph, sibling to URLhaus/MalwareBazaar, accepts indicator submissions with kill-chain context | REST + Auth-Key |
| **abuse.ch YARAify** | YARA rule sharing + scanning. Direct fit since `detection_author` emits YARA | REST + Auth-Key |
| **AlienVault OTX (now LevelBlue Labs OTX)** | One of the largest free CTI communities. Pulses for sharing, pull API for consumption. **Major omission from current draft.** | REST + DirectConnect API |
| **CIRCL Hashlookup** | Fast hash reputation lookup, free, EU-hosted | REST |
| **Shadowserver** | Free network exposure / vulnerability scanning reports. Subscribe by ASN/CIDR/contact. **The draft has it under "monitoring" but Shadowserver also accepts submissions and runs important takedown campaigns.** | REST API |
**1.2.2 Reorder by jurisdictional fit:**
The current #1 (CISA AIS) is US-government-tied. For Europe-focused work the right Tier-1 priorities are roughly:
1. MISP (CIRCL communities, plus ENISA CSIRTs Network communities)
2. OpenCTI (your own knowledge graph)
3. AlienVault OTX (broad reach, low friction)
4. CISA AIS (only if US-victim cases or US-relevant indicators)
5. Cloudflare / hosting abuse APIs
6. Spamhaus
7. URLhaus / MalwareBazaar / ThreatFox
8. AbuseIPDB
9. urlscan.io
10. Netcraft
**1.2.3 Per-row corrections in the existing table:**
- **CISA AIS — "STIX/TAXII bidirectional"** — be specific: STIX 2.1 over TAXII 2.1, with the **AIS Profile** (a restricted subset of STIX). Submitting non-AIS-Profile STIX gets rejected. **[verified]**
- **Cloudflare Abuse Reports API** — also requires noting that high-volume submitters can apply to be a **Trusted Reporter** which gets faster SLAs. **[likely current]**
- **VirusTotal API** — public submissions are visible to all VT Premium customers (incl. potentially the adversary). **The draft doesn't flag this — it's a critical OPSEC point.** Use **VT Private Scanning** for sensitive samples. **[verified]**
- **PhishTank** — community-vetted. As of late 2024 / early 2025 there were reports of reduced moderation activity. **[verify before relying on]**. Netcraft is the more reliable phishing-takedown channel today.
- **Google Web Risk** — access truly is gated by Google customer engineering review; not a 5-minute API key signup. Apply early. **[verified]**
---
### 1.3 — Section 3: Per-Platform Notes
**3.1 CISA AIS:** Add: requires sponsorship from a federal agency or a signed AIS Sharing Agreement, plus the connector software (typically TAXII client). Onboarding measured in weeks, not days. The draft makes it sound like a sign-up form.
**3.2 MISP:** Missing:
- ZeroMQ for real-time push (worth using if you want sub-second propagation to your own consumers)
- Distinction between **events** (point-in-time intelligence) and **feeds** (continuous streams; better for IOC bulk delivery)
- "Create a community" vs "Join a community" tradeoff — joining CIRCL's communities is the lowest-friction entry; creating your own is high-effort and pointless until you have multiple sharing partners
- TLP-marking enforcement is **not** automatic at the MISP level — your client must respect TLP before publishing onward
**3.3 OpenCTI:** Missing:
- The connector framework: ~80+ pre-built connectors (MITRE ATT&CK, MISP, CrowdStrike, Recorded Future, etc.) — most of your enrichment needs are already solved
- The Workbench feature for analyst review before publishing
- Filigran (the company behind OpenCTI) hosts a managed cloud version if you don't want to operate it yourself
**3.4 Cloudflare Abuse Reports API:** Missing:
- API token requires `Account.Abuse Reports` permission — won't work with read-only tokens
- Rate limits documented separately from the abuse API itself
- For Cloudflare-hosted **Workers** (their serverless), abuse reports go to a different channel
- Trusted Reporter program (mentioned above) — apply once you have submission history
**3.5 Spamhaus:** Missing the lists distinction:
- DBL = Domain Block List (domains)
- SBL = Spamhaus Block List (IPs)
- XBL = Exploits Block List (exploit-sourced IPs)
- ZRD = Zero Reputation Domains (newly registered)
- Each list has different submission criteria. Wrong-list submissions get rejected. Your routing engine needs a list-selector.
**3.6 AbuseIPDB:** Missing:
- The 23-category taxonomy (`SSH brute force`, `port scan`, `web app attack`, `phishing`, etc.) — **your evidence type must map to an AbuseIPDB category code** or the submission is low-utility
- Free tier: 1000 reports/day, 100 IP checks/min. Paid tiers scale
- Single-reporter submissions have low weight; reputation requires multiple corroborating submitters. Send to AbuseIPDB *after* sending to other corroborators
**3.7 URLhaus:** Missing:
- Submission auth-key required (free, sign up)
- Manual review for high-confidence flags
- 2024+ stricter format requirements
- Linkage to MalwareBazaar — submit the URL to URLhaus, the sample to MalwareBazaar, link by hash
**3.8 MalwareBazaar:** Missing:
- File size limits (~250MB last I checked)
- Office macro / Windows installer formats need specific tags
- Tag taxonomy is community-driven; non-canonical tags reduce utility
- The "Avoid" line about legal-share is correct but vague. Specifically: do not upload samples obtained under NDA, samples from incidents where the victim hasn't consented, or samples that may contain victim PII (e.g., crafted payloads with the victim's name)
**3.9 PhishTank:** As noted above, declining. Verify status; consider deprioritizing.
**3.10 urlscan.io:** Missing:
- Visibility settings: `public`, `unlisted`, `private` (private = paid)
- Public scans are searchable by everyone — including the adversary monitoring for their kits being analyzed
- The **Search API** is invaluable for retrohunts: "show me every scan in the last 30 days that loaded resource X"
- Bulk submission via UUID-tagged `customagent` field for tracking your submission cohort
**3.11 Google Web Risk:** Missing:
- GCP project + Web Risk API enabled prerequisite
- Submissions evaluated by Google Safe Browsing pipeline; latency hours-to-days
- Successful submissions show up in Chrome / Firefox / Safari Safe Browsing warnings — **massive amplification**. Use only for high-confidence URLs
**3.12 VirusTotal:** Missing:
- Public API: 4 lookups/min, 500/day, 15.5k/month
- Premium API: rate limits negotiated
- File submission privacy: anyone with VT Intelligence can see your sample. **Critical OPSEC point not in draft.**
- VT Private Scanning for sensitive samples
- VT Hunting (YARA livehunt) for ongoing detection
**3.13 Netcraft:** Missing:
- Strong takedown-execution record — Netcraft actually does the takedown work, not just reporting
- Free tier exists for low-volume reporters
- Strongest at brand-protection / phishing
- They prefer evidence package format: source URL + screenshot + redirect chain + landing page HTML
---
### 1.4 — Section 4: Internal Case / Incident Routing Platforms
**1.4.1 Missing platforms:**
| Platform | Best for | Why missing matters |
|---|---|---|
| **Wazuh** | Open-source SIEM with TheHive integration | Many SOCs use it; integrates cleanly with this stack |
| **Microsoft Sentinel** | Cloud SIEM with Logic Apps automation | Major enterprise platform — leaving it out makes the doc feel non-enterprise |
| **Splunk SOAR (formerly Phantom)** | Commercial SOAR | Major in enterprise SOCs |
| **Cortex XSOAR** | Commercial SOAR (Palo Alto) | Same |
| **Shuffle** | Open-source SOAR | Free alternative to XSOAR/Phantom |
| **Tracecat** | Newer open-source SOAR | Younger but actively developed |
| **n8n** | General workflow automation | Not security-specific but widely used as a glue layer |
**1.4.2 TheHive 5 vs 4:** Be explicit — TheHive 4 reached EOL, TheHive 5 is current. Code examples should target TheHive 5 API.
---
### 1.5 — Section 5: Monitoring (Not Primary Reporting)
**1.5.1 Missing high-value monitoring sources:**
| Source | What it gives you | API |
|---|---|---|
| **AlienVault OTX** | Largest free pulse community, IOC subscriptions | REST DirectConnect |
| **CIRCL Passive DNS / Passive SSL** | Historical DNS / cert lookups; EU-hosted | REST |
| **PhishStats** | Phishing URL stream | REST + RSS |
| **DNSDumpster / SecurityTrails / BinaryEdge** | Recon/asset-discovery DBs | REST (mostly paid for bulk) |
| **GreyNoise** | Benign-scanner classification — **reduces false positives** in IP reporting by tagging known internet-noise sources | REST |
| **Spamhaus DNSBL queries** | Free DNSBL lookups | DNS protocol |
| **Maltrail** | Open-source malicious-traffic detection feeds | Static feed download |
| **CT log monitors (crt.sh, Censys CT)** | New-cert issuance for your monitored domains — catches phishing-domain registrations | REST |
**1.5.2 GreyNoise specifically deserves a callout.**
Reporting an IP that GreyNoise classifies as `benign-scanner` (Shodan, Censys, security researchers) gets you blacklisted from AbuseIPDB and embarrasses you with CERTs. **Always GreyNoise-check before submitting an IP report.** This is a one-line API call that prevents a class of bad submissions.
**1.5.3 Shadowserver placement.**
Currently in Section 5 (monitoring only) but Shadowserver also runs *active sinkholing and takedown campaigns* with global reach. They accept tip-offs and IOC contributions. Move them up to Tier 1 receivers, or at least call out the bidirectional relationship.
---
### 1.6 — Section 6: Practical Routing Matrix
**1.6.1 Missing rows:**
| Evidence type | First | Second | Internal |
|---|---|---|---|
| Compromised TLS certificate | CT log monitor sighting → CA revocation request | Cloudflare/host if cert is in use | OpenCTI / TheHive |
| Mobile app malware | Google Play / Apple App Review submission | VirusTotal sample upload | OpenCTI |
| Cryptocurrency wallet (laundering) | Chainalysis / TRM (commercial) or on-chain analysis | OFAC SDN if sanctioned | Internal restricted case |
| Open-source supply-chain attack | Registry security (security@npmjs.com, security@python.org) | GitHub Security Lab | OpenCTI / TheHive |
| Compromised GitHub repo / leaked secret | GitHub Security Advisory + vendor-specific revoke API (e.g., AWS IAM) | Victim org | Internal restricted |
| Tor hidden service hosting malware | Document only (no takedown for .onion); push IOCs to MISP | n/a | OpenCTI |
| Sanctions-evasion crypto | OFAC SDN reporting (US) / EU FSF reporting | National FIU | Internal restricted |
| CSAM (legally separate) | NCMEC CyberTipline (US) / IWF (UK) / INHOPE (international) | National police | Stop processing immediately, preserve under legal hold |
| Phishing-resistant kit / 2FA bypass | Browser vendor reports (Chrome / Firefox / Safari Trust & Safety) | Affected service | OpenCTI |
**1.6.2 Cloudflare-proxied abuse needs a follow-up step.**
Current row says: First → Cloudflare API; Second → Netcraft / PhishTank. Missing: **Third → origin host abuse contact** (extracted by sending Cloudflare a HEAD request that bypasses cache, or via certificate transparency cross-reference). Without this, takedown leaves the origin alive and the attacker just provisions a new CDN front-end.
**1.6.3 The "Leaked credentials/API keys" row is dangerously thin.**
"Victim first → CERT if severe → Internal IR case" — missing the **revocation step**, which is more time-critical than reporting. If you find a leaked AWS access key, the **first** action is `aws iam delete-access-key` via the affected account (with permission) or trigger AWS's automatic key-revocation by submitting to GitHub Secret Scanning. If you find leaked OAuth tokens for GitHub/Slack/etc., the relevant vendor has an automated revocation pathway. **Add the revocation step before victim notification.**
---
### 1.7 — Section 7: Minimum Viable API Stack
The current MVP list (10 items) is too heavy for "minimum viable." A genuine MVP for a new white-hat group is closer to:
1. **OpenCTI** — your knowledge graph (or, if too heavy, just MISP for both)
2. **MISP via CIRCL community** — free, EU-hosted, broad reach
3. **AlienVault OTX** — free, broadest reach for indicator sharing
4. **AbuseIPDB** — free tier, easy
5. **URLhaus + MalwareBazaar + ThreatFox** (the abuse.ch trio — same auth-key, three destinations)
6. **urlscan.io** — free tier, evidence generation
7. **National CERT direct email + GPG** — non-API, but mandatory
That's 7 things, of which 5 are pure free signups. Tackle Cloudflare/Netcraft/Spamhaus/GoogleWebRisk after you have throughput in those 7.
The current MVP includes TheHive — that's case management, not external reporting. Move it out of "API stack" since it's internal infrastructure.
---
### 1.8 — Section 8: Data Handling Rules
**1.8.1 "Never submit publicly" — additions:**
- Insider-threat allegations without verification
- Attribution claims about specific named individuals (the hard line we settled on earlier)
- Government / classified material
- PHI (US HIPAA scope)
- PCI scope financial data
- Children's data (COPPA US; GDPR Article 8 EU)
- Biometric data
- Trade secrets / source code
- Material from unauthorized intrusion (even if you got to it via OSINT, "I downloaded their leaked DB" makes you a recipient of stolen goods in some jurisdictions)
**1.8.2 "Safe to submit" — additions:**
- YARA rules (especially to YARAify)
- Sigma rules (to SigmaHQ via PR)
- Mutex names, named-pipe signatures (good Sysmon detections)
- Persistence registry keys
- Scheduled task names
- TLS fingerprints (JA3, JA4)
- HTTP user-agent strings observed in C2
- ASN block ranges associated with adversary infrastructure
- STIX/TAXII patterns
- ATT&CK technique IDs (always)
**1.8.3 Missing entire section: "Sanitize before submitting"**
- Strip URL query parameters that may contain victim tokens / session IDs
- Hash email local-parts when target destination is public (`a72b91…@example.com`)
- Redact internal hostnames from samples
- Strip x-forwarded-for / source IP from log excerpts that name your honeypot
- Replace victim-org names with role descriptors (`<european_bank>`) unless the submission is to a destination where the victim has consented or the receiver is trusted (CERT)
---
### 1.9 — Section 9: Recommended Submission Object
**1.9.1 Schema gaps (additions in bold):**
```json
{
"case_id": "WG-2026-000001",
"schema_version": "1.0",
"tlp": "AMBER", // use TLP 2.0 values: CLEAR/GREEN/AMBER/AMBER+STRICT/RED
"tlp_marking_definition_ref": "marking-definition--...", // STIX-compatible
"severity": "low|medium|high|critical", // replace A-E with standard
"confidence": "low|medium|high", // or Admiralty A1-F6
"language": "en", // i18n
"first_observed": "2026-05-13T10:00:00Z", // top-level
"last_observed": "2026-05-13T11:30:00Z",
"valid_from": "2026-05-13T10:00:00Z", // STIX-style validity window
"valid_until": "2026-08-13T10:00:00Z",
"threat_type": "phishing|malware|ransomware|credential_exposure|iab|botnet|vulnerability_exploitation",
"victim": {
"organization": "",
"domain": "",
"country": "",
"sector": "",
"nis2_category": "essential|important|n/a", // for EU NIS2 routing
"consent_to_name_publicly": false // sanitization gate
},
"actor": {
"name": "Adira",
"aliases": [],
"campaign": "",
"confidence": "A1|A2|...|F6"
},
"kill_chain": ["recon|weapon|deliver|exploit|install|c2|action"],
"attack_techniques": ["T1566.001", "T1059.003"],
"source": {
"category": "forum|leak_site|telegram|honeypot|sensor|osint|tip",
"first_seen": "",
"last_seen": "",
"collection_method": "lawful_osint_or_partner_feed",
"burn_sensitivity": "low|medium|high" // affects sanitization aggressiveness
},
"observables": {
"ips": [],
"domains": [],
"urls": [],
"hashes": [],
"emails": [],
"wallets": [],
"cves": [],
"yara_rules": [],
"sigma_rules": [],
"mutexes": [],
"named_pipes": [],
"scheduled_tasks": [],
"registry_keys": [],
"user_agents": [],
"tls_fingerprints": [], // JA3/JA4
"certificates": [], // CT log entries / SHA256 of cert
"asn_blocks": [],
"process_names": []
},
"pattern_relationships": [
{"source": "domain:example.com", "type": "resolves_to", "target": "ipv4:1.2.3.4", "first_seen": "..."}
],
"evidence": {
"summary": "",
"sanitized_screenshots": [],
"raw_evidence_location": "internal_restricted_storage",
"detonation_results": [], // sandbox report references
"memory_artifacts": [] // forensic, internal only
},
"timeline": [
{"ts": "...", "event": "..."}
],
"indicators_of_compromise": [], // observables flagged as actively malicious
"recommended_actions": [],
"routing": {
"primary_destinations": [],
"secondary_destinations": [],
"public_disclosure_allowed": false,
"embargo_until": null, // timed disclosure
"coordinated_with": [] // who else has been told (CERT case IDs etc)
},
"audit": {
"submitted_to": [], // append-only history of submissions
"feedback_received": [], // ack IDs, takedown confirmations
"submitter_identity": "wg-handle@misp", // which submitter handle was used
"signed_with": "PGP fingerprint",
"object_sha256": "" // tamper-detect on the object itself
}
}
```
**1.9.2 Other schema concerns:**
- `case_id` format `WG-2026-000001` is fine, but reserve a 2-char org prefix to avoid collision if you ever federate with another working group
- `tlp` should use **TLP 2.0** spec values (`CLEAR`, `GREEN`, `AMBER`, `AMBER+STRICT`, `RED`) — TLP 1.0 used different terms
- Severity / confidence mismatch in v1: severity used `A-E`, confidence used words. Standardize.
- Add a per-object hash so the routing engine can detect tampering between produce-time and submit-time
---
### 1.10 — Section 10: Final Recommendation
**1.10.1 The architecture sentence is missing the feedback edge.**
Current: *Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting*
Better: *Sensors → OpenCTI → TheHive/IRIS → routing engine → MISP + abuse APIs + CERT/AIS → sanitized public reporting → **receipts and outcomes back to OpenCTI** → effectiveness scoring → re-prioritization*
Without the feedback edge, you can't tell which destinations are worth maintaining.
**1.10.2 Missing entirely: closing checklist for "we're ready to submit."**
A final checklist before any external submission fires:
```
[ ] TLP enforced (object.tlp <= destination.max_tlp)
[ ] Sanitization pass complete (PII stripped per destination policy)
[ ] GreyNoise check (if observables include IPs)
[ ] Quota available (rate-limit budget not exceeded)
[ ] Submitter identity registered with destination
[ ] Object signed
[ ] Audit row written
[ ] Human approver clicked yes (for non-automated tier)
```
This belongs as Section 11 or as the closing block of Section 10.
---
## 2. Cross-Cutting Gaps (Not Tied to Any Section)
### 2.1 — OPSEC for the Reporters Themselves
**Not in the doc at all.** If your group is reporting Adira to authorities, Adira may notice — they read MISP communities (those that are open), they read URLhaus (public), and they have visibility into VirusTotal Premium (paid customer).
Required additions:
- **Submission identity registry**: which handle is used on which platform, who has access, rotation schedule
- **Account-creation OPSEC**: don't use personal accounts on submission platforms; create a project handle, use a project email, register with project-owned phone/2FA
- **Network OPSEC for collection**: if you're scraping leak sites or monitoring the adversary's infrastructure, route through a VPN or research-purpose proxy — never the same network as your submission identity
- **PGP for CERT comms**: every national CERT publishes a PGP key. Every email submission to a CERT should be signed and encrypted. Untouched in the draft.
### 2.2 — Burnt Source Protection
If you have a private collection source (honeypot, infiltrated channel, tipped-off insider), publishing IOCs from it can burn the source. Specifically:
- A unique honeypot fingerprint (banner, response timing, listening port) lets the adversary identify which sample came from your honeypot
- Publishing a sample with a unique build artifact (your sandbox's hostname in a DNS query, a timestamp matching your detonation window) reveals your detonation infrastructure
- Reporting a forum URL while it's still live tips off the forum operator that it's being watched
The doc needs a **burn-sensitivity tier** on each observable, and a sanitization step that aggressively scrubs source-identifying artifacts before any external submission.
### 2.3 — Adversary Observability of Your Submissions
Tier each receiver by who can see your submission:
| Receiver | Adversary visibility |
|---|---|
| MISP private community | trusted community only |
| MISP public community / OTX public pulse | anyone with an account |
| URLhaus | public — adversary can monitor |
| MalwareBazaar | public — adversary can detect their sample was uploaded |
| VirusTotal public submission | every VT Premium customer (incl. potentially adversary) |
| VT Private Scanning | only your team (paid) |
| AbuseIPDB | public reputation visible |
| Cloudflare Abuse Reports | only Cloudflare and the reported asset owner |
| CERT direct (GPG-encrypted) | only the CERT |
The routing engine should display this visibility for each destination during human review.
### 2.4 — Chain of Custody / Legal Admissibility
If any of this material may end up in a criminal proceeding, chain of custody matters. Specifically:
- The **raw evidence** must be preserved unmodified, with hashes recorded at acquisition time
- Any **transformation** (sanitization, normalization) must be reversible — the routing engine logs the input hash, the transform applied, and the output hash
- The **submitter identity** for each external submission is logged
- Witnesses (multi-party access logs) are preferred for high-value evidence
The current `evidence.raw_evidence_location` field is a placeholder; it needs structure: storage path, hash, acquisition timestamp, acquirer identity.
### 2.5 — Amplification Risk
Publishing IOCs publicly amplifies awareness — which is good for defenders but bad if:
- The IOC includes a compromised legitimate site (you damage the site owner's reputation)
- The IOC is for a piece of infrastructure that's about to be used in a sting operation by LE
- The IOC reveals an investigation technique still under embargo
A **publish-readiness review** belongs in Section 1 of the doc, not in the closing checklist.
### 2.6 — Failure Modes / Retries
What happens when:
- URLhaus rejects a submission (malformed, low-confidence flag, duplicate)?
- MISP is down for maintenance?
- Cloudflare returns 503?
- Your submitter identity gets rate-limited?
- An API token is revoked mid-batch?
The doc has no resilience layer. Recommend:
- Idempotent submission with client-generated IDs (so retries don't double-submit)
- Per-destination retry policy (exponential backoff with jitter)
- Dead-letter queue for permanent failures — surface in human-review UI
- Per-submitter quota tracking, with auto-failover to backup submitter if available
### 2.7 — Versioning and Maintenance
The doc has no version number, no changelog, no maintainer field, no review cadence. For a living spec like this:
```
---
schema_version: 1.0
last_reviewed: 2026-05-13
next_review_due: 2026-08-13
maintainer: <project lead>
changelog:
- 2026-05-13: initial draft
---
```
API surfaces of these platforms change (Cloudflare deprecations, VT pricing changes, abuse.ch tag taxonomy updates). A quarterly re-validation cadence is sane.
### 2.8 — Multi-Language Submissions
Many national CERTs prefer or require local language for narrative fields (BSI German, ANSSI French, CCN-CERT Spanish). The submission object's `language` field (added above) plus a translation step in the routing engine handles this. Currently absent.
---
## 3. Missing Categories Entirely
### 3.1 — Hosting Provider Abuse Channels (most lack true REST APIs)
| Provider | Channel | API? |
|---|---|---|
| AWS | abuse@amazonaws.com + form | No public REST; AWS responds to email |
| Google Cloud | https://support.google.com/cloud/answer/2417620 | Form-only |
| Azure | https://msrc.microsoft.com/report/abuse | Form + email |
| DigitalOcean | abuse@digitalocean.com | Email + status REST |
| Hetzner | abuse@hetzner.com + form | Form |
| OVH | abuse@ovh.net + Anti-abuse REST API | Yes |
| Linode (Akamai) | abuse@linode.com | Email |
| Vultr | abuse@vultr.com | Email |
Treat email-based providers as a different submission class (template + GPG-signed email, with parsed-receipt detection). Worth a Section 11 in the doc.
### 3.2 — Cryptocurrency / Sanctions
- **Chainalysis Reactor** — commercial, gold standard for on-chain investigations
- **TRM Labs** — commercial alternative
- **CipherTrace** (Mastercard) — commercial
- **OFAC SDN** reporting — for US-sanctioned wallets
- **EU Financial Sanctions Files (FSF)** — for EU sanctions
- **National FIUs** — Financial Intelligence Units, country-specific
- Free / open: **GraphSense** (open-source on-chain analytics), **Etherscan** (manual)
### 3.3 — Mobile / App Store
- **Google Play Protect submissions** (for Android malware)
- **Apple App Review report** (for malicious iOS apps)
- **APKMirror reports** (for repackaged apps)
- **F-Droid security contacts** (for compromised FOSS apps)
### 3.4 — Open-Source Supply Chain
- **PyPI:** security@python.org
- **npm:** security@npmjs.com + vendored auto-revoke for leaked tokens
- **crates.io:** help@crates.io
- **RubyGems:** security@rubygems.org
- **Maven Central:** central@sonatype.org
- **GitHub Security Lab** (research collaboration)
- **OpenSSF** Vulnerability Disclosure (cross-ecosystem coordination)
- **Sigstore** (provenance verification, longer-term)
### 3.5 — Certificate Authorities
- **Let's Encrypt:** ACME revocation API for ACME-issued certs
- **Sectigo / DigiCert / GlobalSign / Entrust:** abuse contacts in CA/Browser Forum compliance docs
- **CT log monitors** for detection (crt.sh, Censys CT, Google CT)
### 3.6 — Tor / Dark Web
Limited takedown leverage for `.onion` services, but worth documenting:
- Document via **Tor Project's abuse handling page** (limited leverage)
- Contribute IOCs to **DarkOwl, Recorded Future, Flashpoint** (commercial dark-web monitoring) if you have access
- Push to MISP with `tor` tag for community awareness
### 3.7 — CSAM (Legally Separate Pathway)
If CSAM is encountered during collection, **stop processing immediately**. CSAM has separate legal handling rules:
- **NCMEC CyberTipline** (US)
- **IWF (Internet Watch Foundation)** (UK)
- **INHOPE** (international hotline network)
- Possessing CSAM is illegal even for research; do not attempt to verify, document, or share. Report and delete from your systems under documented legal hold.
This deserves a Section 12 with a hard stop: "if encountered, halt and report via the channels below; do not include in any other submission flow."
---
## 4. Missing Platforms Worth Adding (Quick List)
### Free / Open
- AlienVault OTX (huge omission)
- ThreatFox
- YARAify
- CIRCL Hashlookup
- CIRCL Passive DNS / Passive SSL
- Maltrail feeds
- crt.sh / Censys CT
- GreyNoise community tier
- Spamhaus DNSBL queries
- PhishStats
### Commercial / Paid (worth listing for completeness)
- Recorded Future
- Mandiant Advantage (now Google Threat Intelligence)
- CrowdStrike Falcon Intelligence
- Sekoia.io
- Flashpoint
- DomainTools (passive DNS / WHOIS history)
- RiskIQ (now Microsoft Defender Threat Intelligence)
- Anomali ThreatStream
### Intelligence Communities (membership-based)
- FIRST.org (CSIRT global community)
- Trusted Introducer (European CSIRT trust framework)
- M3AAWG (Messaging, Malware, Mobile Anti-Abuse Working Group)
- APWG (Anti-Phishing Working Group)
- Cyber Threat Alliance (commercial CTI sharing)
- ENISA CSIRTs Network
---
## 5. Implementation Priorities for Blue48
In our agent stack, this doc translates to concrete work:
### 5.1 — Block G additions (when we get there)
1. **`report_writer` agent** outputs the v2 normalized object (Section 1.9.1 above) as canonical format
2. **New `routing_engine` component** (extension of `report_writer`, or a 7th agent) — consumes the object, applies routing matrix, fans out via API adapters
3. **Adapter priority order for blue48 v1.0:**
1. MISP (PyMISP)
2. AlienVault OTX (REST)
3. AbuseIPDB (REST + category mapping)
4. URLhaus + MalwareBazaar + ThreatFox (shared abuse.ch auth-key)
5. urlscan.io (REST, with private-by-default visibility)
6. Cloudflare Abuse Reports
7. GPG-signed email to BSI / CERT-Bund (since the user is in DE)
### 5.2 — Schema work
- `config/submission_schema.json` — JSON Schema for the v2 normalized object
- `config/routing_matrix.yaml` — declarative rules: evidence type → destinations, with TLP ceilings and quotas
- `core/sanitize.py` — pre-submission scrubbing per destination policy
- `core/audit.py` — append-only log of every submission, signed
- `core/tlp.py` — TLP 2.0 enforcement
### 5.3 — Pre-submission gates (before any adapter fires)
```
1. Schema valid?
2. TLP <= destination ceiling?
3. Sanitization complete?
4. GreyNoise check passes (for IPs)?
5. Quota available?
6. Submitter identity registered with destination?
7. Object signed?
8. Audit row written?
9. Human approver yes (for non-auto tier)?
```
If any fail → drop into human-review queue with the reason. Never silently skip.
### 5.4 — Failure / retry layer
- Per-destination idempotency keys (client-generated)
- Exponential backoff with jitter
- Dead-letter queue for permanent failures, surfaced in `data/dlq/`
- Per-submitter quota tracking with auto-failover
---
## 6. Summary of v2 Findings
| Category | Count | Action |
|---|---:|---|
| Section-by-section corrections | 38 | Fold into the draft |
| New cross-cutting sections needed | 8 | Add as Sections 1118 |
| Missing platform categories | 7 | Each warrants a sub-section |
| Missing free/open platforms (Tier 1) | 5 | Add to Section 2 |
| Schema field gaps | 17 | Adopt v2 schema above |
| Pre-submission gates not defined | 9 | Add as closing checklist |
After folding these in, the document becomes a publishable internal whitepaper *and* a complete spec for the blue48 routing engine. The first draft was a confident outline; the v2 turns it into a working manual.
If useful, I can next:
- (a) Generate `config/submission_schema.json` (JSON Schema for the v2 normalized object) into `~/blue48/config/`
- (b) Generate `config/routing_matrix.yaml` (declarative routing rules) into `~/blue48/config/`
- (c) Scaffold `agents/routing_engine.py` with adapter stubs for the seven Block-G priority destinations
- (d) Re-issue this review as suggested edits inline against the original (so you can accept/reject diff-style)
Pick any subset and I'll ship.