init: scaffold psyc — defensive CTI routing & evidence-sealing platform

Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 12:43:47 +02:00
commit e04c6c96d8
30 changed files with 8271 additions and 0 deletions
--- a/docs/archive/hivemap.md
+++ b/docs/archive/hivemap.md
@@ -0,0 +1,451 @@
+# Blue48 Worker Mesh Architecture
+
+**Document type:** Project record / technical architecture  
+**Scope:** Worker names, responsibilities, interfaces, data flow, human review boundaries  
+**Status:** Draft v1  
+
+---
+
+## 1. Purpose
+
+Blue48 should not rely on one large, expensive, opaque model to perform all cyber-intelligence operations. The platform should be built as a mesh of small, specialized workers.
+
+Each worker performs one narrow function, writes structured output, and passes a normalized case object to the next stage. Heavy models are reserved for judgment-heavy tasks such as confidence scoring, routing explanations, public report drafting, and training-example generation.
+
+Core principle:
+
+> Small workers produce traceable outputs. Humans approve sensitive decisions. The Ledger proves what happened.
+
+---
+
+## 2. High-Level Flow
+
+```text
+Scoutline
+→ Proofline
+→ Mapline
+→ Classifyline
+→ Sealine
+→ Routeline
+→ Ledgerline
+→ Publishline
+→ Trainline
+```
+
+Operator version:
+
+```text
+Detect → Validate → Map → Classify → Seal Evidence → Route → Submit → Track → Archive → Learn
+```
+
+---
+
+## 3. Worker Lines
+
+| Line | Purpose |
+|---|---|
+| **Scoutline** | Finds, fetches, parses, and deduplicates lawful intelligence sources. |
+| **Proofline** | Validates claims, checks indicators, measures freshness, and scores confidence. |
+| **Mapline** | Resolves victims, actors, sectors, jurisdictions, CERT routes, and affected products. |
+| **Classifyline** | Assigns severity, TLP, incident type, and operational class. |
+| **Sealine** | Packages evidence, encrypts it for authorized recipients, and destroys local plaintext/key material when policy allows. |
+| **Routeline** | Selects destinations, builds payloads, enforces destination policy, and submits reports. |
+| **Ledgerline** | Records immutable audit events, receipts, outcomes, and follow-up status. |
+| **Publishline** | Produces sanitized public intelligence only after mitigation and approval. |
+| **Trainline** | Converts lawful, reviewed intelligence into LoRA-ready training data. |
+
+---
+
+## 4. Core Worker Set
+
+The first conceptual worker set is:
+
+```text
+Scout → Verifier → Mapper → Classifier → Sealer → Router → Courier → Ledger
+```
+
+Support workers:
+
+```text
+Watcher → Archivist → Publisher
+```
+
+Operational sentence:
+
+```text
+Scout detects.
+Verifier confirms.
+Mapper identifies.
+Classifier prioritizes.
+Sealer protects.
+Router decides.
+Courier submits.
+Ledger proves.
+Watcher follows up.
+Archivist forgets safely.
+Publisher informs.
+```
+
+---
+
+## 5. Granular Worker Breakdown
+
+### 5.1 Scoutline
+
+| Worker | Job | Model requirement |
+|---|---|---|
+| **SourcePlanner** | Maintains the approved source list, collection schedules, and source eligibility. | None / rules |
+| **Crawler** | Discovers new pages, feeds, advisories, reports, APIs, and datasets. | None |
+| **Fetcher** | Downloads pages, PDFs, JSON, RSS, STIX/TAXII, MISP events, and API responses. | None |
+| **Parser** | Extracts title, date, author, body, tables, indicators, and metadata. | Rules / small model |
+| **Deduper** | Detects duplicate reports, reposted IOCs, syndicated articles, and repeated claims. | Embeddings / rules |
+| **SourceRanker** | Scores the source based on trust, history, origin, and license status. | Rules / small model |
+| **Signalizer** | Converts parsed content into candidate intelligence signals. | Small/medium model |
+
+Output:
+
+```json
+{
+  "signal_id": "uuid",
+  "source_type": "advisory | cti_report | abuse_feed | ransomware_monitor | public_blog | misp_event",
+  "summary": "short defensive summary",
+  "observed_at": "2026-05-13T00:00:00Z",
+  "raw_evidence_location": "internal-only-reference"
+}
+```
+
+---
+
+### 5.2 Proofline
+
+| Worker | Job |
+|---|---|
+| **Correlator** | Checks whether the same signal appears across multiple independent sources. |
+| **IOCChecker** | Validates domains, IPs, hashes, URLs, wallet addresses, emails, and CVEs. |
+| **FreshnessChecker** | Determines whether the signal is current, stale, repeated, or resurfaced. |
+| **ClaimChecker** | Labels language as confirmed, claimed, observed, rumored, or speculative. |
+| **ConfidenceScorer** | Produces final confidence and optional Admiralty Code values. |
+
+Output:
+
+```json
+{
+  "confidence": "low | medium | high",
+  "source_reliability": "A | B | C | D | E | F | unknown",
+  "information_credibility": "1 | 2 | 3 | 4 | 5 | 6 | unknown",
+  "claim_status": "confirmed | claimed | observed | rumored | speculative",
+  "freshness": "new | recent | stale | resurfaced"
+}
+```
+
+---
+
+### 5.3 Mapline
+
+| Worker | Job |
+|---|---|
+| **EntityResolver** | Maps organization names, domains, subsidiaries, brands, and aliases. |
+| **GeoResolver** | Maps victim country, jurisdiction, national CERT, and cross-border implications. |
+| **SectorMapper** | Maps victim sector and critical-infrastructure status. |
+| **ActorMapper** | Maps actor names, aliases, ransomware brands, campaigns, and confidence. |
+| **CVEResolver** | Maps vulnerabilities to CVEs, affected products, KEV status, and exploit relevance. |
+
+Output:
+
+```json
+{
+  "victim": {
+    "name": "",
+    "domain": "",
+    "country": "",
+    "sector": "",
+    "critical_infrastructure": false
+  },
+  "actor": {
+    "name": "",
+    "aliases": [],
+    "campaign": "",
+    "confidence": "low | medium | high"
+  },
+  "jurisdiction": {
+    "primary_cert": "",
+    "law_enforcement_route": "",
+    "sector_isac": ""
+  }
+}
+```
+
+---
+
+### 5.4 Classifyline
+
+| Worker | Job |
+|---|---|
+| **Classifier** | Assigns incident type, severity, internal class, and response SLA. |
+| **TLPGuard** | Ensures TLP data cannot be routed to destinations that cannot receive it. |
+| **DestinationPolicyGuard** | Blocks inappropriate, illegal, excessive, or sensitive submissions. |
+
+Internal class mapping:
+
+| Internal class | Meaning | External severity |
+|---|---|---|
+| **A** | Imminent harm or attack likely underway | Critical |
+| **B** | Credible planned attack | High |
+| **C** | Confirmed exposure | High / Medium |
+| **D** | Campaign intelligence | Medium / High |
+| **E** | Weak signal or watchlist item | Low / Monitor |
+
+Output:
+
+```json
+{
+  "class": "A | B | C | D | E",
+  "severity": "low | medium | high | critical",
+  "tlp": "RED | AMBER | GREEN | CLEAR",
+  "incident_type": "ransomware | credential_leak | access_sale | phishing | malware | exploit | botnet | data_leak",
+  "policy_blocks": []
+}
+```
+
+---
+
+### 5.5 Sealine
+
+Sealine replaces the old primary concept of “sanitization.” The objective is not to destroy useful evidence, but to protect it.
+
+| Worker | Job |
+|---|---|
+| **EvidencePackager** | Collects sensitive evidence, hashes it, and packages it with metadata. |
+| **Sealer** | Encrypts evidence for authorized recipients using public-key or hybrid encryption. |
+| **KeyBurner** | Destroys local unwrapped evidence keys after successful sealing. |
+| **RetentionGuard** | Enforces retention, deletion, plaintext destruction, and crypto-erasure policy. |
+
+Sealine principle:
+
+> Preserve the truth. Seal the sensitive evidence. Route only what each recipient is authorized to receive.
+
+Output:
+
+```json
+{
+  "sealed_evidence": {
+    "package_id": "uuid",
+    "encryption": "age | PGP | CMS | hybrid",
+    "recipient_keys": [
+      {
+        "recipient": "CERT-Bund",
+        "key_id": "authority-key-id",
+        "wrapped_key": "encrypted-evidence-key"
+      }
+    ],
+    "payload_hash": "sha256",
+    "plaintext_destroyed": true,
+    "local_unwrapped_key_destroyed": true
+  }
+}
+```
+
+---
+
+### 5.6 Routeline
+
+| Worker | Job |
+|---|---|
+| **RoutePlanner** | Chooses destination order based on victim, country, sector, severity, TLP, and evidence type. |
+| **PayloadBuilder** | Builds destination-specific payloads: sealed package, STIX bundle, MISP event, abuse report, or public-safe extract. |
+| **Redactor** | Minimizes public/semi-public outputs only. Redactor does not replace Sealer. |
+| **Courier** | Submits through API, portal, structured email, or secure upload. |
+| **RateLimiter** | Enforces destination quotas, retries, and backoff. |
+| **ReceiptCollector** | Captures case IDs, acknowledgements, API responses, and status URLs. |
+
+Example route object:
+
+```json
+{
+  "routes": [
+    {
+      "destination": "CERT-Bund",
+      "type": "authority",
+      "payload": "sealed_evidence_package",
+      "priority": 1,
+      "max_tlp_allowed": "RED"
+    },
+    {
+      "destination": "MISP trusted community",
+      "type": "cti_sharing",
+      "payload": "stix_indicators",
+      "priority": 2,
+      "max_tlp_allowed": "AMBER"
+    },
+    {
+      "destination": "Cloudflare Abuse API",
+      "type": "provider_abuse",
+      "payload": "minimized_abuse_report",
+      "priority": 3,
+      "max_tlp_allowed": "CLEAR"
+    }
+  ]
+}
+```
+
+---
+
+### 5.7 Ledgerline
+
+| Worker | Job |
+|---|---|
+| **Ledger** | Creates immutable audit records for all external submissions and destructive actions. |
+| **Watcher** | Polls outcomes: takedown status, MISP sightings, CERT acknowledgement, provider response. |
+| **Archivist** | Handles retention, sealed package lifecycle, legal holds, and crypto-erasure confirmation. |
+
+Ledger record:
+
+```json
+{
+  "timestamp": "2026-05-13T00:00:00Z",
+  "case_id": "B48-2026-000001",
+  "destination": "CERT-Bund",
+  "payload_hash": "sha256",
+  "submitter_identity": "blue48-official-handle",
+  "tlp": "AMBER",
+  "response_id": "external-case-id",
+  "outcome": "submitted | acknowledged | rejected | actioned"
+}
+```
+
+---
+
+### 5.8 Publishline
+
+| Worker | Job |
+|---|---|
+| **Publisher** | Produces public-safe intelligence reports after mitigation and approval. |
+
+Publisher may include:
+
+- sector trend
+- actor trend
+- CVEs
+- TTPs
+- defensive recommendations
+- sanitized IOCs
+- non-sensitive timelines
+
+Publisher must not include:
+
+- raw credentials
+- stolen data
+- victim secrets
+- live access details
+- exact criminal-source links
+- unmitigated exploit paths
+
+---
+
+## 6. Which Workers Need Models?
+
+| Worker | Model need |
+|---|---|
+| SourcePlanner | None / rules |
+| Crawler / Fetcher | None |
+| Parser | Rules / small model |
+| Deduper | Embeddings / rules |
+| Signalizer | Small or medium model |
+| ClaimChecker | Small or medium model |
+| ConfidenceScorer | Medium model |
+| EntityResolver | Rules + embeddings |
+| ActorMapper | Small or medium model |
+| Classifier | Small or medium model |
+| RoutePlanner | Rules first, model second |
+| PayloadBuilder | Small model |
+| Publisher | Medium or large model |
+| ExampleBuilder | Medium model |
+| QualityGate | Medium model + rules |
+
+Heavy models should be reserved for:
+
+```text
+ConfidenceScorer
+Classifier
+Publisher
+ExampleBuilder
+QualityGate
+```
+
+---
+
+## 7. Human Review Boundaries
+
+Human approval is required before:
+
+- sending sealed evidence to any external destination
+- contacting law enforcement or CERTs with sensitive evidence
+- publishing a public advisory
+- destroying plaintext evidence
+- destroying local unwrapped evidence keys
+- exporting a training dataset
+- modifying routing policy
+- modifying recipient keys
+
+Two-person control should be required for:
+
+- sending TLP:RED or highly sensitive packages
+- deleting evidence
+- changing authority recipient keys
+- publishing named-victim reports
+- exporting training data based on internal cases
+
+---
+
+## 8. MVP Worker Build Order
+
+Initial worker implementation priority:
+
+1. SourcePlanner
+2. Fetcher
+3. Parser
+4. Deduper
+5. Signalizer
+6. IOCChecker
+7. EntityResolver
+8. GeoResolver
+9. Classifier
+10. EvidencePackager
+11. Sealer
+12. RoutePlanner
+13. Courier
+14. Ledger
+15. ReceiptCollector
+16. IntelMiner
+
+Minimum operational chain:
+
+```text
+Fetcher → Parser → Signalizer → IOCChecker → EntityResolver → Classifier → Sealer → RoutePlanner → Courier → Ledger
+```
+
+---
+
+## 9. Technical Notes
+
+Recommended implementation style:
+
+| Component | Recommendation |
+|---|---|
+| Worker runtime | Python services, Celery, Temporal, Prefect, or lightweight queue workers |
+| Message format | JSON normalized case object |
+| Interop format | STIX 2.1 where useful |
+| Storage | PostgreSQL + object storage |
+| Search | OpenSearch or Meilisearch |
+| CTI graph | OpenCTI or MISP integration |
+| Audit | append-only ledger table |
+| Secrets | `.env`, secret manager, runtime injection only |
+| UI | Blue48 Operations Cockpit |
+
+---
+
+## 10. Summary
+
+Blue48 should operate as a worker mesh, not a monolithic AI agent.
+
+The system should use small deterministic workers where possible, small models where useful, and larger models only for judgment-heavy steps. Sensitive evidence is handled by Sealine, not casually rendered or distributed. Routing and public reporting are controlled by policy guards, human review, and immutable audit logging.