Files
psyc/docs/style.md
m17hr1l e04c6c96d8 init: scaffold psyc — defensive CTI routing & evidence-sealing platform
Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence,
URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail),
flat Typer CLI, Result[T, E] type module, structlog config.
Architecture in docs/dossier.md; 12-fold style guide in docs/style.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 12:43:47 +02:00

7.9 KiB

psyc Python Style Guide

Established: 2026-05-14 (Day 2 of the hackathon) Status: Live — all new code must follow this; existing code retrofitted in the same commit.

This guide is the output of a 12-fold style review. It exists so the codebase reads consistently end-to-end and there's never a question of which idiom to use.


1. Optional values — Optional[X], not X | None

from typing import Optional

def get_case(case_id: str) -> Optional[Case]:
    ...

def find_actor(
    name: str,
    country: Optional[str] = None,
) -> Actor:
    ...

Rationale: explicit, name carries meaning, easy to grep for nullable returns.


2. Collection generics — List[X], Dict[K, V], not list[X], dict[K, V]

from typing import List, Dict

tags: List[str] = []
quotas: Dict[str, int] = {}

def route(case: Case) -> List[Destination]:
    ...

Rationale: uppercase typing forms read as type hints, not as runtime constructors. Pair with rule 1 — both come from typing.


3. Pydantic mutable defaults — Field(default_factory=...), always

from pydantic import BaseModel, Field

class Case(BaseModel):
    tags: List[str] = Field(default_factory=list)
    routing: Routing = Field(default_factory=Routing)
    observables: Dict[str, List[str]] = Field(default_factory=dict)

Rationale: intent is explicit even though Pydantic deep-copies literal defaults. Same idiom works in @dataclass, so we never have to remember which framework we're in.


4. Function signature wrapping — single line, 120-char limit

def fetch_and_signal(limit: Optional[int] = None, timeout: float = 30.0, user_agent: str = USER_AGENT) -> List[Case]:
    ...

If a signature still exceeds 120 chars after that, then hug-parens with trailing comma. Wrapping is the exception, not the default.

# pyproject.toml
[tool.ruff]
line-length = 120

5. Errors — Result[T, E] for expected failures, raise for genuinely exceptional ones

A "miss" or "blocked" outcome is data, not an exception. Use the Result type in psyc.result:

from psyc.result import Result, Ok, Err

def get_case(case_id: str) -> Result[Case, str]:
    row = db.fetchone(case_id)
    if not row:
        return Err(f"case not found: {case_id}")
    return Ok(Case.model_validate_json(row))

# caller:
result = get_case(cid)
if isinstance(result, Err):
    raise HTTPException(404, result.reason)
case = result.value

Reserve raise for: programmer errors, invariant violations, unrecoverable I/O. Never raise from a function whose failure mode is part of normal operation (lookup miss, policy block, rate limit, etc.).


6. Closed string sets — class X(str, Enum)

from enum import Enum

class TLP(str, Enum):
    RED = "RED"
    AMBER = "AMBER"
    GREEN = "GREEN"
    CLEAR = "CLEAR"

case.classification.tlp = TLP.AMBER
if case.classification.tlp == TLP.RED:
    ...

Real Enum object — iterable, comparable, has .value. JSON-serializable thanks to the str mixin. No Literal aliases, no bare string constants.


7. Imports — isort blocks (stdlib / third-party / local)

import csv
from datetime import datetime
from pathlib import Path
from typing import List, Optional

import httpx
import structlog
import typer
from pydantic import BaseModel, Field

from psyc import db
from psyc.models import Case, TLP
from psyc.result import Err, Ok, Result

Blocks separated by a single blank line. Within each block: alphabetical, import x before from x import y is not required — pick whatever ruff/isort defaults to.

[tool.ruff.lint.isort]
known-first-party = ["psyc"]

8. Conditionals — early returns / guard clauses

def classify(case: Case) -> Case:
    if not case.observables.urls:
        return case
    if case.classification.severity:
        return case
    case.classification.severity = Severity.MEDIUM
    return case

The happy path stays flat. No "single exit point" dogma. No match/case unless you're actually dispatching on a tagged union — for if/elif chains, plain if is clearer.


9. Docstrings — module-level only, functions self-documenting

Every .py file starts with a one-line docstring describing what it is. Functions rely on naming + type signatures. Add an inline comment only when the why is non-obvious.

"""Scoutline — Fetcher + Signalizer for URLhaus."""

from __future__ import annotations
# ... imports ...

def fetch_recent(timeout: float = 30.0) -> str:
    ...

def parse_urlhaus_csv(text: str) -> Iterable[Dict]:
    ...

Forbidden: function docstrings restating the signature ("""Fetch the recent CSV. Returns: str.""" — useless).


10. Logging — structlog over stdlib logging, event names + key/value

import structlog

log = structlog.get_logger(__name__)

log.info("case.ingested", case_id=case.case_id, source="urlhaus", count=len(rows))
log.warning("route.blocked", case_id=case.case_id, dest="VirusTotal", reason="tlp_red")
log.error("submit.failed", case_id=case.case_id, dest="CERT-Bund", error=str(exc))

Event names: <area>.<action> lowercase, dot-separated. Never interpolate values into the event name — they go in the key/value payload so the ledger and audit code can index on them.

Configuration lives in psyc/log.py, imported once at process start (CLI and cockpit entrypoints).


11. SQL — SQLAlchemy Core (Tables + engine.connect()), no ORM

from sqlalchemy import Table, Column, String, MetaData, create_engine, insert, select

engine = create_engine("sqlite:///data/psyc.db", future=True)
meta = MetaData()

cases = Table(
    "cases", meta,
    Column("case_id", String, primary_key=True),
    Column("summary", String, nullable=False),
    Column("tlp", String, nullable=False),
    # ...
)

with engine.begin() as conn:
    conn.execute(insert(cases).values(case_id=case.case_id, summary=case.summary, tlp=case.classification.tlp.value))

with engine.connect() as conn:
    row = conn.execute(select(cases).where(cases.c.case_id == case_id)).fetchone()

No ORM session, no declarative_base, no SQLModel. SQL stays visible as expressions, but parameter binding and dialect handling are SQLAlchemy's job. engine.begin() for writes (auto-commit), engine.connect() for reads.


12. CLI — flat Typer commands, hyphenated names

import typer

app = typer.Typer(add_completion=False, help="psyc — defensive CTI routing & sealing")

@app.command("init")
def init() -> None: ...

@app.command("fetch-urlhaus")
def fetch_urlhaus(limit: int = 50) -> None: ...

@app.command("seal-pack")
def seal_pack(case_id: str) -> None: ...

@app.command("route-plan")
def route_plan(case_id: str) -> None: ...

@app.command("serve")
def serve(host: str = "127.0.0.1", port: int = 8000) -> None: ...

Invocation: psyc fetch-urlhaus --limit 50. No sub-apps, no nested namespaces. Command name = function name with underscores → hyphens.


Tooling

[tool.ruff]
line-length = 120

[tool.ruff.lint]
select = ["E", "F", "I", "UP", "B"]
# UP rules that conflict with rule 1/2 are ignored:
ignore = ["UP006", "UP007", "UP035"]

[tool.ruff.lint.isort]
known-first-party = ["psyc"]

UP006 / UP007 / UP035 would auto-rewrite List[X]/Optional[X] to lowercase / pipe forms — disabled because rules 1 and 2 outrank them.


Out of scope

This guide intentionally does not cover:

  • async vs sync (decide per worker line; httpx clients sync inside Typer commands, async only if the cockpit demands it)
  • test framework (pytest is the default; no folding required)
  • exception class hierarchies (rule 5 minimizes their need; design per line as it arrives)
  • API response shapes for the cockpit (REST/HTML; JSON-only routes when stage 3+ ships)

Add new folds at the bottom of this file as they come up. Don't retrofit the guide silently — each addition is a recorded decision.