Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
452 lines
12 KiB
Markdown
452 lines
12 KiB
Markdown
# Blue48 Worker Mesh Architecture
|
|
|
|
**Document type:** Project record / technical architecture
|
|
**Scope:** Worker names, responsibilities, interfaces, data flow, human review boundaries
|
|
**Status:** Draft v1
|
|
|
|
---
|
|
|
|
## 1. Purpose
|
|
|
|
Blue48 should not rely on one large, expensive, opaque model to perform all cyber-intelligence operations. The platform should be built as a mesh of small, specialized workers.
|
|
|
|
Each worker performs one narrow function, writes structured output, and passes a normalized case object to the next stage. Heavy models are reserved for judgment-heavy tasks such as confidence scoring, routing explanations, public report drafting, and training-example generation.
|
|
|
|
Core principle:
|
|
|
|
> Small workers produce traceable outputs. Humans approve sensitive decisions. The Ledger proves what happened.
|
|
|
|
---
|
|
|
|
## 2. High-Level Flow
|
|
|
|
```text
|
|
Scoutline
|
|
→ Proofline
|
|
→ Mapline
|
|
→ Classifyline
|
|
→ Sealine
|
|
→ Routeline
|
|
→ Ledgerline
|
|
→ Publishline
|
|
→ Trainline
|
|
```
|
|
|
|
Operator version:
|
|
|
|
```text
|
|
Detect → Validate → Map → Classify → Seal Evidence → Route → Submit → Track → Archive → Learn
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Worker Lines
|
|
|
|
| Line | Purpose |
|
|
|---|---|
|
|
| **Scoutline** | Finds, fetches, parses, and deduplicates lawful intelligence sources. |
|
|
| **Proofline** | Validates claims, checks indicators, measures freshness, and scores confidence. |
|
|
| **Mapline** | Resolves victims, actors, sectors, jurisdictions, CERT routes, and affected products. |
|
|
| **Classifyline** | Assigns severity, TLP, incident type, and operational class. |
|
|
| **Sealine** | Packages evidence, encrypts it for authorized recipients, and destroys local plaintext/key material when policy allows. |
|
|
| **Routeline** | Selects destinations, builds payloads, enforces destination policy, and submits reports. |
|
|
| **Ledgerline** | Records immutable audit events, receipts, outcomes, and follow-up status. |
|
|
| **Publishline** | Produces sanitized public intelligence only after mitigation and approval. |
|
|
| **Trainline** | Converts lawful, reviewed intelligence into LoRA-ready training data. |
|
|
|
|
---
|
|
|
|
## 4. Core Worker Set
|
|
|
|
The first conceptual worker set is:
|
|
|
|
```text
|
|
Scout → Verifier → Mapper → Classifier → Sealer → Router → Courier → Ledger
|
|
```
|
|
|
|
Support workers:
|
|
|
|
```text
|
|
Watcher → Archivist → Publisher
|
|
```
|
|
|
|
Operational sentence:
|
|
|
|
```text
|
|
Scout detects.
|
|
Verifier confirms.
|
|
Mapper identifies.
|
|
Classifier prioritizes.
|
|
Sealer protects.
|
|
Router decides.
|
|
Courier submits.
|
|
Ledger proves.
|
|
Watcher follows up.
|
|
Archivist forgets safely.
|
|
Publisher informs.
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Granular Worker Breakdown
|
|
|
|
### 5.1 Scoutline
|
|
|
|
| Worker | Job | Model requirement |
|
|
|---|---|---|
|
|
| **SourcePlanner** | Maintains the approved source list, collection schedules, and source eligibility. | None / rules |
|
|
| **Crawler** | Discovers new pages, feeds, advisories, reports, APIs, and datasets. | None |
|
|
| **Fetcher** | Downloads pages, PDFs, JSON, RSS, STIX/TAXII, MISP events, and API responses. | None |
|
|
| **Parser** | Extracts title, date, author, body, tables, indicators, and metadata. | Rules / small model |
|
|
| **Deduper** | Detects duplicate reports, reposted IOCs, syndicated articles, and repeated claims. | Embeddings / rules |
|
|
| **SourceRanker** | Scores the source based on trust, history, origin, and license status. | Rules / small model |
|
|
| **Signalizer** | Converts parsed content into candidate intelligence signals. | Small/medium model |
|
|
|
|
Output:
|
|
|
|
```json
|
|
{
|
|
"signal_id": "uuid",
|
|
"source_type": "advisory | cti_report | abuse_feed | ransomware_monitor | public_blog | misp_event",
|
|
"summary": "short defensive summary",
|
|
"observed_at": "2026-05-13T00:00:00Z",
|
|
"raw_evidence_location": "internal-only-reference"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5.2 Proofline
|
|
|
|
| Worker | Job |
|
|
|---|---|
|
|
| **Correlator** | Checks whether the same signal appears across multiple independent sources. |
|
|
| **IOCChecker** | Validates domains, IPs, hashes, URLs, wallet addresses, emails, and CVEs. |
|
|
| **FreshnessChecker** | Determines whether the signal is current, stale, repeated, or resurfaced. |
|
|
| **ClaimChecker** | Labels language as confirmed, claimed, observed, rumored, or speculative. |
|
|
| **ConfidenceScorer** | Produces final confidence and optional Admiralty Code values. |
|
|
|
|
Output:
|
|
|
|
```json
|
|
{
|
|
"confidence": "low | medium | high",
|
|
"source_reliability": "A | B | C | D | E | F | unknown",
|
|
"information_credibility": "1 | 2 | 3 | 4 | 5 | 6 | unknown",
|
|
"claim_status": "confirmed | claimed | observed | rumored | speculative",
|
|
"freshness": "new | recent | stale | resurfaced"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5.3 Mapline
|
|
|
|
| Worker | Job |
|
|
|---|---|
|
|
| **EntityResolver** | Maps organization names, domains, subsidiaries, brands, and aliases. |
|
|
| **GeoResolver** | Maps victim country, jurisdiction, national CERT, and cross-border implications. |
|
|
| **SectorMapper** | Maps victim sector and critical-infrastructure status. |
|
|
| **ActorMapper** | Maps actor names, aliases, ransomware brands, campaigns, and confidence. |
|
|
| **CVEResolver** | Maps vulnerabilities to CVEs, affected products, KEV status, and exploit relevance. |
|
|
|
|
Output:
|
|
|
|
```json
|
|
{
|
|
"victim": {
|
|
"name": "",
|
|
"domain": "",
|
|
"country": "",
|
|
"sector": "",
|
|
"critical_infrastructure": false
|
|
},
|
|
"actor": {
|
|
"name": "",
|
|
"aliases": [],
|
|
"campaign": "",
|
|
"confidence": "low | medium | high"
|
|
},
|
|
"jurisdiction": {
|
|
"primary_cert": "",
|
|
"law_enforcement_route": "",
|
|
"sector_isac": ""
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5.4 Classifyline
|
|
|
|
| Worker | Job |
|
|
|---|---|
|
|
| **Classifier** | Assigns incident type, severity, internal class, and response SLA. |
|
|
| **TLPGuard** | Ensures TLP data cannot be routed to destinations that cannot receive it. |
|
|
| **DestinationPolicyGuard** | Blocks inappropriate, illegal, excessive, or sensitive submissions. |
|
|
|
|
Internal class mapping:
|
|
|
|
| Internal class | Meaning | External severity |
|
|
|---|---|---|
|
|
| **A** | Imminent harm or attack likely underway | Critical |
|
|
| **B** | Credible planned attack | High |
|
|
| **C** | Confirmed exposure | High / Medium |
|
|
| **D** | Campaign intelligence | Medium / High |
|
|
| **E** | Weak signal or watchlist item | Low / Monitor |
|
|
|
|
Output:
|
|
|
|
```json
|
|
{
|
|
"class": "A | B | C | D | E",
|
|
"severity": "low | medium | high | critical",
|
|
"tlp": "RED | AMBER | GREEN | CLEAR",
|
|
"incident_type": "ransomware | credential_leak | access_sale | phishing | malware | exploit | botnet | data_leak",
|
|
"policy_blocks": []
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5.5 Sealine
|
|
|
|
Sealine replaces the old primary concept of “sanitization.” The objective is not to destroy useful evidence, but to protect it.
|
|
|
|
| Worker | Job |
|
|
|---|---|
|
|
| **EvidencePackager** | Collects sensitive evidence, hashes it, and packages it with metadata. |
|
|
| **Sealer** | Encrypts evidence for authorized recipients using public-key or hybrid encryption. |
|
|
| **KeyBurner** | Destroys local unwrapped evidence keys after successful sealing. |
|
|
| **RetentionGuard** | Enforces retention, deletion, plaintext destruction, and crypto-erasure policy. |
|
|
|
|
Sealine principle:
|
|
|
|
> Preserve the truth. Seal the sensitive evidence. Route only what each recipient is authorized to receive.
|
|
|
|
Output:
|
|
|
|
```json
|
|
{
|
|
"sealed_evidence": {
|
|
"package_id": "uuid",
|
|
"encryption": "age | PGP | CMS | hybrid",
|
|
"recipient_keys": [
|
|
{
|
|
"recipient": "CERT-Bund",
|
|
"key_id": "authority-key-id",
|
|
"wrapped_key": "encrypted-evidence-key"
|
|
}
|
|
],
|
|
"payload_hash": "sha256",
|
|
"plaintext_destroyed": true,
|
|
"local_unwrapped_key_destroyed": true
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5.6 Routeline
|
|
|
|
| Worker | Job |
|
|
|---|---|
|
|
| **RoutePlanner** | Chooses destination order based on victim, country, sector, severity, TLP, and evidence type. |
|
|
| **PayloadBuilder** | Builds destination-specific payloads: sealed package, STIX bundle, MISP event, abuse report, or public-safe extract. |
|
|
| **Redactor** | Minimizes public/semi-public outputs only. Redactor does not replace Sealer. |
|
|
| **Courier** | Submits through API, portal, structured email, or secure upload. |
|
|
| **RateLimiter** | Enforces destination quotas, retries, and backoff. |
|
|
| **ReceiptCollector** | Captures case IDs, acknowledgements, API responses, and status URLs. |
|
|
|
|
Example route object:
|
|
|
|
```json
|
|
{
|
|
"routes": [
|
|
{
|
|
"destination": "CERT-Bund",
|
|
"type": "authority",
|
|
"payload": "sealed_evidence_package",
|
|
"priority": 1,
|
|
"max_tlp_allowed": "RED"
|
|
},
|
|
{
|
|
"destination": "MISP trusted community",
|
|
"type": "cti_sharing",
|
|
"payload": "stix_indicators",
|
|
"priority": 2,
|
|
"max_tlp_allowed": "AMBER"
|
|
},
|
|
{
|
|
"destination": "Cloudflare Abuse API",
|
|
"type": "provider_abuse",
|
|
"payload": "minimized_abuse_report",
|
|
"priority": 3,
|
|
"max_tlp_allowed": "CLEAR"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5.7 Ledgerline
|
|
|
|
| Worker | Job |
|
|
|---|---|
|
|
| **Ledger** | Creates immutable audit records for all external submissions and destructive actions. |
|
|
| **Watcher** | Polls outcomes: takedown status, MISP sightings, CERT acknowledgement, provider response. |
|
|
| **Archivist** | Handles retention, sealed package lifecycle, legal holds, and crypto-erasure confirmation. |
|
|
|
|
Ledger record:
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2026-05-13T00:00:00Z",
|
|
"case_id": "B48-2026-000001",
|
|
"destination": "CERT-Bund",
|
|
"payload_hash": "sha256",
|
|
"submitter_identity": "blue48-official-handle",
|
|
"tlp": "AMBER",
|
|
"response_id": "external-case-id",
|
|
"outcome": "submitted | acknowledged | rejected | actioned"
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5.8 Publishline
|
|
|
|
| Worker | Job |
|
|
|---|---|
|
|
| **Publisher** | Produces public-safe intelligence reports after mitigation and approval. |
|
|
|
|
Publisher may include:
|
|
|
|
- sector trend
|
|
- actor trend
|
|
- CVEs
|
|
- TTPs
|
|
- defensive recommendations
|
|
- sanitized IOCs
|
|
- non-sensitive timelines
|
|
|
|
Publisher must not include:
|
|
|
|
- raw credentials
|
|
- stolen data
|
|
- victim secrets
|
|
- live access details
|
|
- exact criminal-source links
|
|
- unmitigated exploit paths
|
|
|
|
---
|
|
|
|
## 6. Which Workers Need Models?
|
|
|
|
| Worker | Model need |
|
|
|---|---|
|
|
| SourcePlanner | None / rules |
|
|
| Crawler / Fetcher | None |
|
|
| Parser | Rules / small model |
|
|
| Deduper | Embeddings / rules |
|
|
| Signalizer | Small or medium model |
|
|
| ClaimChecker | Small or medium model |
|
|
| ConfidenceScorer | Medium model |
|
|
| EntityResolver | Rules + embeddings |
|
|
| ActorMapper | Small or medium model |
|
|
| Classifier | Small or medium model |
|
|
| RoutePlanner | Rules first, model second |
|
|
| PayloadBuilder | Small model |
|
|
| Publisher | Medium or large model |
|
|
| ExampleBuilder | Medium model |
|
|
| QualityGate | Medium model + rules |
|
|
|
|
Heavy models should be reserved for:
|
|
|
|
```text
|
|
ConfidenceScorer
|
|
Classifier
|
|
Publisher
|
|
ExampleBuilder
|
|
QualityGate
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Human Review Boundaries
|
|
|
|
Human approval is required before:
|
|
|
|
- sending sealed evidence to any external destination
|
|
- contacting law enforcement or CERTs with sensitive evidence
|
|
- publishing a public advisory
|
|
- destroying plaintext evidence
|
|
- destroying local unwrapped evidence keys
|
|
- exporting a training dataset
|
|
- modifying routing policy
|
|
- modifying recipient keys
|
|
|
|
Two-person control should be required for:
|
|
|
|
- sending TLP:RED or highly sensitive packages
|
|
- deleting evidence
|
|
- changing authority recipient keys
|
|
- publishing named-victim reports
|
|
- exporting training data based on internal cases
|
|
|
|
---
|
|
|
|
## 8. MVP Worker Build Order
|
|
|
|
Initial worker implementation priority:
|
|
|
|
1. SourcePlanner
|
|
2. Fetcher
|
|
3. Parser
|
|
4. Deduper
|
|
5. Signalizer
|
|
6. IOCChecker
|
|
7. EntityResolver
|
|
8. GeoResolver
|
|
9. Classifier
|
|
10. EvidencePackager
|
|
11. Sealer
|
|
12. RoutePlanner
|
|
13. Courier
|
|
14. Ledger
|
|
15. ReceiptCollector
|
|
16. IntelMiner
|
|
|
|
Minimum operational chain:
|
|
|
|
```text
|
|
Fetcher → Parser → Signalizer → IOCChecker → EntityResolver → Classifier → Sealer → RoutePlanner → Courier → Ledger
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Technical Notes
|
|
|
|
Recommended implementation style:
|
|
|
|
| Component | Recommendation |
|
|
|---|---|
|
|
| Worker runtime | Python services, Celery, Temporal, Prefect, or lightweight queue workers |
|
|
| Message format | JSON normalized case object |
|
|
| Interop format | STIX 2.1 where useful |
|
|
| Storage | PostgreSQL + object storage |
|
|
| Search | OpenSearch or Meilisearch |
|
|
| CTI graph | OpenCTI or MISP integration |
|
|
| Audit | append-only ledger table |
|
|
| Secrets | `.env`, secret manager, runtime injection only |
|
|
| UI | Blue48 Operations Cockpit |
|
|
|
|
---
|
|
|
|
## 10. Summary
|
|
|
|
Blue48 should operate as a worker mesh, not a monolithic AI agent.
|
|
|
|
The system should use small deterministic workers where possible, small models where useful, and larger models only for judgment-heavy steps. Sensitive evidence is handled by Sealine, not casually rendered or distributed. Routing and public reporting are controlled by policy guards, human review, and immutable audit logging.
|