Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.9 KiB
psyc Python Style Guide
Established: 2026-05-14 (Day 2 of the hackathon) Status: Live — all new code must follow this; existing code retrofitted in the same commit.
This guide is the output of a 12-fold style review. It exists so the codebase reads consistently end-to-end and there's never a question of which idiom to use.
1. Optional values — Optional[X], not X | None
from typing import Optional
def get_case(case_id: str) -> Optional[Case]:
...
def find_actor(
name: str,
country: Optional[str] = None,
) -> Actor:
...
Rationale: explicit, name carries meaning, easy to grep for nullable returns.
2. Collection generics — List[X], Dict[K, V], not list[X], dict[K, V]
from typing import List, Dict
tags: List[str] = []
quotas: Dict[str, int] = {}
def route(case: Case) -> List[Destination]:
...
Rationale: uppercase typing forms read as type hints, not as runtime constructors. Pair with rule 1 — both come from typing.
3. Pydantic mutable defaults — Field(default_factory=...), always
from pydantic import BaseModel, Field
class Case(BaseModel):
tags: List[str] = Field(default_factory=list)
routing: Routing = Field(default_factory=Routing)
observables: Dict[str, List[str]] = Field(default_factory=dict)
Rationale: intent is explicit even though Pydantic deep-copies literal defaults. Same idiom works in @dataclass, so we never have to remember which framework we're in.
4. Function signature wrapping — single line, 120-char limit
def fetch_and_signal(limit: Optional[int] = None, timeout: float = 30.0, user_agent: str = USER_AGENT) -> List[Case]:
...
If a signature still exceeds 120 chars after that, then hug-parens with trailing comma. Wrapping is the exception, not the default.
# pyproject.toml
[tool.ruff]
line-length = 120
5. Errors — Result[T, E] for expected failures, raise for genuinely exceptional ones
A "miss" or "blocked" outcome is data, not an exception. Use the Result type in psyc.result:
from psyc.result import Result, Ok, Err
def get_case(case_id: str) -> Result[Case, str]:
row = db.fetchone(case_id)
if not row:
return Err(f"case not found: {case_id}")
return Ok(Case.model_validate_json(row))
# caller:
result = get_case(cid)
if isinstance(result, Err):
raise HTTPException(404, result.reason)
case = result.value
Reserve raise for: programmer errors, invariant violations, unrecoverable I/O. Never raise from a function whose failure mode is part of normal operation (lookup miss, policy block, rate limit, etc.).
6. Closed string sets — class X(str, Enum)
from enum import Enum
class TLP(str, Enum):
RED = "RED"
AMBER = "AMBER"
GREEN = "GREEN"
CLEAR = "CLEAR"
case.classification.tlp = TLP.AMBER
if case.classification.tlp == TLP.RED:
...
Real Enum object — iterable, comparable, has .value. JSON-serializable thanks to the str mixin. No Literal aliases, no bare string constants.
7. Imports — isort blocks (stdlib / third-party / local)
import csv
from datetime import datetime
from pathlib import Path
from typing import List, Optional
import httpx
import structlog
import typer
from pydantic import BaseModel, Field
from psyc import db
from psyc.models import Case, TLP
from psyc.result import Err, Ok, Result
Blocks separated by a single blank line. Within each block: alphabetical, import x before from x import y is not required — pick whatever ruff/isort defaults to.
[tool.ruff.lint.isort]
known-first-party = ["psyc"]
8. Conditionals — early returns / guard clauses
def classify(case: Case) -> Case:
if not case.observables.urls:
return case
if case.classification.severity:
return case
case.classification.severity = Severity.MEDIUM
return case
The happy path stays flat. No "single exit point" dogma. No match/case unless you're actually dispatching on a tagged union — for if/elif chains, plain if is clearer.
9. Docstrings — module-level only, functions self-documenting
Every .py file starts with a one-line docstring describing what it is. Functions rely on naming + type signatures. Add an inline comment only when the why is non-obvious.
"""Scoutline — Fetcher + Signalizer for URLhaus."""
from __future__ import annotations
# ... imports ...
def fetch_recent(timeout: float = 30.0) -> str:
...
def parse_urlhaus_csv(text: str) -> Iterable[Dict]:
...
Forbidden: function docstrings restating the signature ("""Fetch the recent CSV. Returns: str.""" — useless).
10. Logging — structlog over stdlib logging, event names + key/value
import structlog
log = structlog.get_logger(__name__)
log.info("case.ingested", case_id=case.case_id, source="urlhaus", count=len(rows))
log.warning("route.blocked", case_id=case.case_id, dest="VirusTotal", reason="tlp_red")
log.error("submit.failed", case_id=case.case_id, dest="CERT-Bund", error=str(exc))
Event names: <area>.<action> lowercase, dot-separated. Never interpolate values into the event name — they go in the key/value payload so the ledger and audit code can index on them.
Configuration lives in psyc/log.py, imported once at process start (CLI and cockpit entrypoints).
11. SQL — SQLAlchemy Core (Tables + engine.connect()), no ORM
from sqlalchemy import Table, Column, String, MetaData, create_engine, insert, select
engine = create_engine("sqlite:///data/psyc.db", future=True)
meta = MetaData()
cases = Table(
"cases", meta,
Column("case_id", String, primary_key=True),
Column("summary", String, nullable=False),
Column("tlp", String, nullable=False),
# ...
)
with engine.begin() as conn:
conn.execute(insert(cases).values(case_id=case.case_id, summary=case.summary, tlp=case.classification.tlp.value))
with engine.connect() as conn:
row = conn.execute(select(cases).where(cases.c.case_id == case_id)).fetchone()
No ORM session, no declarative_base, no SQLModel. SQL stays visible as expressions, but parameter binding and dialect handling are SQLAlchemy's job. engine.begin() for writes (auto-commit), engine.connect() for reads.
12. CLI — flat Typer commands, hyphenated names
import typer
app = typer.Typer(add_completion=False, help="psyc — defensive CTI routing & sealing")
@app.command("init")
def init() -> None: ...
@app.command("fetch-urlhaus")
def fetch_urlhaus(limit: int = 50) -> None: ...
@app.command("seal-pack")
def seal_pack(case_id: str) -> None: ...
@app.command("route-plan")
def route_plan(case_id: str) -> None: ...
@app.command("serve")
def serve(host: str = "127.0.0.1", port: int = 8000) -> None: ...
Invocation: psyc fetch-urlhaus --limit 50. No sub-apps, no nested namespaces. Command name = function name with underscores → hyphens.
Tooling
[tool.ruff]
line-length = 120
[tool.ruff.lint]
select = ["E", "F", "I", "UP", "B"]
# UP rules that conflict with rule 1/2 are ignored:
ignore = ["UP006", "UP007", "UP035"]
[tool.ruff.lint.isort]
known-first-party = ["psyc"]
UP006 / UP007 / UP035 would auto-rewrite List[X]/Optional[X] to lowercase / pipe forms — disabled because rules 1 and 2 outrank them.
Out of scope
This guide intentionally does not cover:
- async vs sync (decide per worker line; httpx clients sync inside Typer commands, async only if the cockpit demands it)
- test framework (pytest is the default; no folding required)
- exception class hierarchies (rule 5 minimizes their need; design per line as it arrives)
- API response shapes for the cockpit (REST/HTML; JSON-only routes when stage 3+ ships)
Add new folds at the bottom of this file as they come up. Don't retrofit the guide silently — each addition is a recorded decision.