stage-29: fetch-all resilience + Mozilla-compatible UA for CISA

Two production-discovered fixes after first deploy:

- CISA's CDN was 403'ing the "psyc/0.1 (defensive CTI; hackathon
  prototype)" User-Agent from the cloud.neuronetz.ai exit IP. Switched
  to a Mozilla-compatible UA that identifies us honestly while passing
  the CDN's UA filters. Overridable via PSYC_HTTP_USER_AGENT.
- fetch-all aborted on the first HTTPStatusError, so a CISA hiccup
  killed the threatfox/malware-bazaar/otx legs that come after. The
  outer loop now catches any exception per-source, logs a skip, and
  moves on. Single-source failures no longer poison the rest of the
  pull.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
m17hr1l
2026-05-25 16:56:27 +02:00
parent fad7ad0d49
commit d7999150b3
2 changed files with 8 additions and 2 deletions

View File

@@ -114,7 +114,7 @@ def fetch_all() -> None:
for source, limit in plan: for source, limit in plan:
try: try:
_ingest(source, limit) _ingest(source, limit)
except RuntimeError as exc: except Exception as exc: # noqa: BLE001 — keep going if one feed misbehaves
typer.echo(f" skip {source}: {exc}", err=True) typer.echo(f" skip {source}: {exc}", err=True)

View File

@@ -21,7 +21,13 @@ from psyc import log
from psyc.models import Case, IncidentType, Observables from psyc.models import Case, IncidentType, Observables
USER_AGENT = "psyc/0.1 (defensive CTI; hackathon prototype)" # CISA's CDN 403s "exotic" UAs from some IPs; a Mozilla-compatible identifier
# is universally accepted and still identifies us honestly. Overridable via env
# if a feed ever wants a specific UA.
USER_AGENT = os.environ.get(
"PSYC_HTTP_USER_AGENT",
"Mozilla/5.0 (compatible; psyc/0.1; +https://psyc.neuronetz.ai)",
)
HTTP_TIMEOUT = 30.0 HTTP_TIMEOUT = 30.0
URLHAUS_RECENT_CSV = "https://urlhaus.abuse.ch/downloads/csv_recent/" URLHAUS_RECENT_CSV = "https://urlhaus.abuse.ch/downloads/csv_recent/"