Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12 KiB
Blue48 Worker Mesh Architecture
Document type: Project record / technical architecture
Scope: Worker names, responsibilities, interfaces, data flow, human review boundaries
Status: Draft v1
1. Purpose
Blue48 should not rely on one large, expensive, opaque model to perform all cyber-intelligence operations. The platform should be built as a mesh of small, specialized workers.
Each worker performs one narrow function, writes structured output, and passes a normalized case object to the next stage. Heavy models are reserved for judgment-heavy tasks such as confidence scoring, routing explanations, public report drafting, and training-example generation.
Core principle:
Small workers produce traceable outputs. Humans approve sensitive decisions. The Ledger proves what happened.
2. High-Level Flow
Scoutline
→ Proofline
→ Mapline
→ Classifyline
→ Sealine
→ Routeline
→ Ledgerline
→ Publishline
→ Trainline
Operator version:
Detect → Validate → Map → Classify → Seal Evidence → Route → Submit → Track → Archive → Learn
3. Worker Lines
| Line | Purpose |
|---|---|
| Scoutline | Finds, fetches, parses, and deduplicates lawful intelligence sources. |
| Proofline | Validates claims, checks indicators, measures freshness, and scores confidence. |
| Mapline | Resolves victims, actors, sectors, jurisdictions, CERT routes, and affected products. |
| Classifyline | Assigns severity, TLP, incident type, and operational class. |
| Sealine | Packages evidence, encrypts it for authorized recipients, and destroys local plaintext/key material when policy allows. |
| Routeline | Selects destinations, builds payloads, enforces destination policy, and submits reports. |
| Ledgerline | Records immutable audit events, receipts, outcomes, and follow-up status. |
| Publishline | Produces sanitized public intelligence only after mitigation and approval. |
| Trainline | Converts lawful, reviewed intelligence into LoRA-ready training data. |
4. Core Worker Set
The first conceptual worker set is:
Scout → Verifier → Mapper → Classifier → Sealer → Router → Courier → Ledger
Support workers:
Watcher → Archivist → Publisher
Operational sentence:
Scout detects.
Verifier confirms.
Mapper identifies.
Classifier prioritizes.
Sealer protects.
Router decides.
Courier submits.
Ledger proves.
Watcher follows up.
Archivist forgets safely.
Publisher informs.
5. Granular Worker Breakdown
5.1 Scoutline
| Worker | Job | Model requirement |
|---|---|---|
| SourcePlanner | Maintains the approved source list, collection schedules, and source eligibility. | None / rules |
| Crawler | Discovers new pages, feeds, advisories, reports, APIs, and datasets. | None |
| Fetcher | Downloads pages, PDFs, JSON, RSS, STIX/TAXII, MISP events, and API responses. | None |
| Parser | Extracts title, date, author, body, tables, indicators, and metadata. | Rules / small model |
| Deduper | Detects duplicate reports, reposted IOCs, syndicated articles, and repeated claims. | Embeddings / rules |
| SourceRanker | Scores the source based on trust, history, origin, and license status. | Rules / small model |
| Signalizer | Converts parsed content into candidate intelligence signals. | Small/medium model |
Output:
{
"signal_id": "uuid",
"source_type": "advisory | cti_report | abuse_feed | ransomware_monitor | public_blog | misp_event",
"summary": "short defensive summary",
"observed_at": "2026-05-13T00:00:00Z",
"raw_evidence_location": "internal-only-reference"
}
5.2 Proofline
| Worker | Job |
|---|---|
| Correlator | Checks whether the same signal appears across multiple independent sources. |
| IOCChecker | Validates domains, IPs, hashes, URLs, wallet addresses, emails, and CVEs. |
| FreshnessChecker | Determines whether the signal is current, stale, repeated, or resurfaced. |
| ClaimChecker | Labels language as confirmed, claimed, observed, rumored, or speculative. |
| ConfidenceScorer | Produces final confidence and optional Admiralty Code values. |
Output:
{
"confidence": "low | medium | high",
"source_reliability": "A | B | C | D | E | F | unknown",
"information_credibility": "1 | 2 | 3 | 4 | 5 | 6 | unknown",
"claim_status": "confirmed | claimed | observed | rumored | speculative",
"freshness": "new | recent | stale | resurfaced"
}
5.3 Mapline
| Worker | Job |
|---|---|
| EntityResolver | Maps organization names, domains, subsidiaries, brands, and aliases. |
| GeoResolver | Maps victim country, jurisdiction, national CERT, and cross-border implications. |
| SectorMapper | Maps victim sector and critical-infrastructure status. |
| ActorMapper | Maps actor names, aliases, ransomware brands, campaigns, and confidence. |
| CVEResolver | Maps vulnerabilities to CVEs, affected products, KEV status, and exploit relevance. |
Output:
{
"victim": {
"name": "",
"domain": "",
"country": "",
"sector": "",
"critical_infrastructure": false
},
"actor": {
"name": "",
"aliases": [],
"campaign": "",
"confidence": "low | medium | high"
},
"jurisdiction": {
"primary_cert": "",
"law_enforcement_route": "",
"sector_isac": ""
}
}
5.4 Classifyline
| Worker | Job |
|---|---|
| Classifier | Assigns incident type, severity, internal class, and response SLA. |
| TLPGuard | Ensures TLP data cannot be routed to destinations that cannot receive it. |
| DestinationPolicyGuard | Blocks inappropriate, illegal, excessive, or sensitive submissions. |
Internal class mapping:
| Internal class | Meaning | External severity |
|---|---|---|
| A | Imminent harm or attack likely underway | Critical |
| B | Credible planned attack | High |
| C | Confirmed exposure | High / Medium |
| D | Campaign intelligence | Medium / High |
| E | Weak signal or watchlist item | Low / Monitor |
Output:
{
"class": "A | B | C | D | E",
"severity": "low | medium | high | critical",
"tlp": "RED | AMBER | GREEN | CLEAR",
"incident_type": "ransomware | credential_leak | access_sale | phishing | malware | exploit | botnet | data_leak",
"policy_blocks": []
}
5.5 Sealine
Sealine replaces the old primary concept of “sanitization.” The objective is not to destroy useful evidence, but to protect it.
| Worker | Job |
|---|---|
| EvidencePackager | Collects sensitive evidence, hashes it, and packages it with metadata. |
| Sealer | Encrypts evidence for authorized recipients using public-key or hybrid encryption. |
| KeyBurner | Destroys local unwrapped evidence keys after successful sealing. |
| RetentionGuard | Enforces retention, deletion, plaintext destruction, and crypto-erasure policy. |
Sealine principle:
Preserve the truth. Seal the sensitive evidence. Route only what each recipient is authorized to receive.
Output:
{
"sealed_evidence": {
"package_id": "uuid",
"encryption": "age | PGP | CMS | hybrid",
"recipient_keys": [
{
"recipient": "CERT-Bund",
"key_id": "authority-key-id",
"wrapped_key": "encrypted-evidence-key"
}
],
"payload_hash": "sha256",
"plaintext_destroyed": true,
"local_unwrapped_key_destroyed": true
}
}
5.6 Routeline
| Worker | Job |
|---|---|
| RoutePlanner | Chooses destination order based on victim, country, sector, severity, TLP, and evidence type. |
| PayloadBuilder | Builds destination-specific payloads: sealed package, STIX bundle, MISP event, abuse report, or public-safe extract. |
| Redactor | Minimizes public/semi-public outputs only. Redactor does not replace Sealer. |
| Courier | Submits through API, portal, structured email, or secure upload. |
| RateLimiter | Enforces destination quotas, retries, and backoff. |
| ReceiptCollector | Captures case IDs, acknowledgements, API responses, and status URLs. |
Example route object:
{
"routes": [
{
"destination": "CERT-Bund",
"type": "authority",
"payload": "sealed_evidence_package",
"priority": 1,
"max_tlp_allowed": "RED"
},
{
"destination": "MISP trusted community",
"type": "cti_sharing",
"payload": "stix_indicators",
"priority": 2,
"max_tlp_allowed": "AMBER"
},
{
"destination": "Cloudflare Abuse API",
"type": "provider_abuse",
"payload": "minimized_abuse_report",
"priority": 3,
"max_tlp_allowed": "CLEAR"
}
]
}
5.7 Ledgerline
| Worker | Job |
|---|---|
| Ledger | Creates immutable audit records for all external submissions and destructive actions. |
| Watcher | Polls outcomes: takedown status, MISP sightings, CERT acknowledgement, provider response. |
| Archivist | Handles retention, sealed package lifecycle, legal holds, and crypto-erasure confirmation. |
Ledger record:
{
"timestamp": "2026-05-13T00:00:00Z",
"case_id": "B48-2026-000001",
"destination": "CERT-Bund",
"payload_hash": "sha256",
"submitter_identity": "blue48-official-handle",
"tlp": "AMBER",
"response_id": "external-case-id",
"outcome": "submitted | acknowledged | rejected | actioned"
}
5.8 Publishline
| Worker | Job |
|---|---|
| Publisher | Produces public-safe intelligence reports after mitigation and approval. |
Publisher may include:
- sector trend
- actor trend
- CVEs
- TTPs
- defensive recommendations
- sanitized IOCs
- non-sensitive timelines
Publisher must not include:
- raw credentials
- stolen data
- victim secrets
- live access details
- exact criminal-source links
- unmitigated exploit paths
6. Which Workers Need Models?
| Worker | Model need |
|---|---|
| SourcePlanner | None / rules |
| Crawler / Fetcher | None |
| Parser | Rules / small model |
| Deduper | Embeddings / rules |
| Signalizer | Small or medium model |
| ClaimChecker | Small or medium model |
| ConfidenceScorer | Medium model |
| EntityResolver | Rules + embeddings |
| ActorMapper | Small or medium model |
| Classifier | Small or medium model |
| RoutePlanner | Rules first, model second |
| PayloadBuilder | Small model |
| Publisher | Medium or large model |
| ExampleBuilder | Medium model |
| QualityGate | Medium model + rules |
Heavy models should be reserved for:
ConfidenceScorer
Classifier
Publisher
ExampleBuilder
QualityGate
7. Human Review Boundaries
Human approval is required before:
- sending sealed evidence to any external destination
- contacting law enforcement or CERTs with sensitive evidence
- publishing a public advisory
- destroying plaintext evidence
- destroying local unwrapped evidence keys
- exporting a training dataset
- modifying routing policy
- modifying recipient keys
Two-person control should be required for:
- sending TLP:RED or highly sensitive packages
- deleting evidence
- changing authority recipient keys
- publishing named-victim reports
- exporting training data based on internal cases
8. MVP Worker Build Order
Initial worker implementation priority:
- SourcePlanner
- Fetcher
- Parser
- Deduper
- Signalizer
- IOCChecker
- EntityResolver
- GeoResolver
- Classifier
- EvidencePackager
- Sealer
- RoutePlanner
- Courier
- Ledger
- ReceiptCollector
- IntelMiner
Minimum operational chain:
Fetcher → Parser → Signalizer → IOCChecker → EntityResolver → Classifier → Sealer → RoutePlanner → Courier → Ledger
9. Technical Notes
Recommended implementation style:
| Component | Recommendation |
|---|---|
| Worker runtime | Python services, Celery, Temporal, Prefect, or lightweight queue workers |
| Message format | JSON normalized case object |
| Interop format | STIX 2.1 where useful |
| Storage | PostgreSQL + object storage |
| Search | OpenSearch or Meilisearch |
| CTI graph | OpenCTI or MISP integration |
| Audit | append-only ledger table |
| Secrets | .env, secret manager, runtime injection only |
| UI | Blue48 Operations Cockpit |
10. Summary
Blue48 should operate as a worker mesh, not a monolithic AI agent.
The system should use small deterministic workers where possible, small models where useful, and larger models only for judgment-heavy steps. Sensitive evidence is handled by Sealine, not casually rendered or distributed. Routing and public reporting are controlled by policy guards, human review, and immutable audit logging.