init: scaffold psyc — defensive CTI routing & evidence-sealing platform
Stage-1 vertical slice: Pydantic Case model, SQLAlchemy Core persistence, URLhaus Scoutline fetcher, FastAPI/Jinja cockpit (cases list + detail), flat Typer CLI, Result[T, E] type module, structlog config. Architecture in docs/dossier.md; 12-fold style guide in docs/style.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
284
docs/style.md
Normal file
284
docs/style.md
Normal file
@@ -0,0 +1,284 @@
|
||||
# psyc Python Style Guide
|
||||
|
||||
**Established:** 2026-05-14 (Day 2 of the hackathon)
|
||||
**Status:** Live — all new code must follow this; existing code retrofitted in the same commit.
|
||||
|
||||
This guide is the output of a 12-fold style review. It exists so the codebase reads consistently end-to-end and there's never a question of which idiom to use.
|
||||
|
||||
---
|
||||
|
||||
## 1. Optional values — `Optional[X]`, not `X | None`
|
||||
|
||||
```python
|
||||
from typing import Optional
|
||||
|
||||
def get_case(case_id: str) -> Optional[Case]:
|
||||
...
|
||||
|
||||
def find_actor(
|
||||
name: str,
|
||||
country: Optional[str] = None,
|
||||
) -> Actor:
|
||||
...
|
||||
```
|
||||
|
||||
**Rationale:** explicit, name carries meaning, easy to grep for nullable returns.
|
||||
|
||||
---
|
||||
|
||||
## 2. Collection generics — `List[X]`, `Dict[K, V]`, not `list[X]`, `dict[K, V]`
|
||||
|
||||
```python
|
||||
from typing import List, Dict
|
||||
|
||||
tags: List[str] = []
|
||||
quotas: Dict[str, int] = {}
|
||||
|
||||
def route(case: Case) -> List[Destination]:
|
||||
...
|
||||
```
|
||||
|
||||
**Rationale:** uppercase typing forms read as type hints, not as runtime constructors. Pair with **rule 1** — both come from `typing`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Pydantic mutable defaults — `Field(default_factory=...)`, always
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
class Case(BaseModel):
|
||||
tags: List[str] = Field(default_factory=list)
|
||||
routing: Routing = Field(default_factory=Routing)
|
||||
observables: Dict[str, List[str]] = Field(default_factory=dict)
|
||||
```
|
||||
|
||||
**Rationale:** intent is explicit even though Pydantic deep-copies literal defaults. Same idiom works in `@dataclass`, so we never have to remember which framework we're in.
|
||||
|
||||
---
|
||||
|
||||
## 4. Function signature wrapping — single line, 120-char limit
|
||||
|
||||
```python
|
||||
def fetch_and_signal(limit: Optional[int] = None, timeout: float = 30.0, user_agent: str = USER_AGENT) -> List[Case]:
|
||||
...
|
||||
```
|
||||
|
||||
If a signature still exceeds 120 chars after that, *then* hug-parens with trailing comma. Wrapping is the exception, not the default.
|
||||
|
||||
```toml
|
||||
# pyproject.toml
|
||||
[tool.ruff]
|
||||
line-length = 120
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Errors — `Result[T, E]` for expected failures, `raise` for genuinely exceptional ones
|
||||
|
||||
A "miss" or "blocked" outcome is data, not an exception. Use the `Result` type in `psyc.result`:
|
||||
|
||||
```python
|
||||
from psyc.result import Result, Ok, Err
|
||||
|
||||
def get_case(case_id: str) -> Result[Case, str]:
|
||||
row = db.fetchone(case_id)
|
||||
if not row:
|
||||
return Err(f"case not found: {case_id}")
|
||||
return Ok(Case.model_validate_json(row))
|
||||
|
||||
# caller:
|
||||
result = get_case(cid)
|
||||
if isinstance(result, Err):
|
||||
raise HTTPException(404, result.reason)
|
||||
case = result.value
|
||||
```
|
||||
|
||||
**Reserve `raise` for:** programmer errors, invariant violations, unrecoverable I/O. Never `raise` from a function whose failure mode is part of normal operation (lookup miss, policy block, rate limit, etc.).
|
||||
|
||||
---
|
||||
|
||||
## 6. Closed string sets — `class X(str, Enum)`
|
||||
|
||||
```python
|
||||
from enum import Enum
|
||||
|
||||
class TLP(str, Enum):
|
||||
RED = "RED"
|
||||
AMBER = "AMBER"
|
||||
GREEN = "GREEN"
|
||||
CLEAR = "CLEAR"
|
||||
|
||||
case.classification.tlp = TLP.AMBER
|
||||
if case.classification.tlp == TLP.RED:
|
||||
...
|
||||
```
|
||||
|
||||
Real Enum object — iterable, comparable, has `.value`. JSON-serializable thanks to the `str` mixin. No `Literal` aliases, no bare string constants.
|
||||
|
||||
---
|
||||
|
||||
## 7. Imports — isort blocks (stdlib / third-party / local)
|
||||
|
||||
```python
|
||||
import csv
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import List, Optional
|
||||
|
||||
import httpx
|
||||
import structlog
|
||||
import typer
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from psyc import db
|
||||
from psyc.models import Case, TLP
|
||||
from psyc.result import Err, Ok, Result
|
||||
```
|
||||
|
||||
Blocks separated by a single blank line. Within each block: alphabetical, `import x` before `from x import y` is **not** required — pick whatever ruff/isort defaults to.
|
||||
|
||||
```toml
|
||||
[tool.ruff.lint.isort]
|
||||
known-first-party = ["psyc"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Conditionals — early returns / guard clauses
|
||||
|
||||
```python
|
||||
def classify(case: Case) -> Case:
|
||||
if not case.observables.urls:
|
||||
return case
|
||||
if case.classification.severity:
|
||||
return case
|
||||
case.classification.severity = Severity.MEDIUM
|
||||
return case
|
||||
```
|
||||
|
||||
The happy path stays flat. No "single exit point" dogma. No `match`/`case` unless you're actually dispatching on a tagged union — for if/elif chains, plain `if` is clearer.
|
||||
|
||||
---
|
||||
|
||||
## 9. Docstrings — module-level only, functions self-documenting
|
||||
|
||||
Every `.py` file starts with a one-line docstring describing what it is. Functions rely on naming + type signatures. Add an inline comment only when the **why** is non-obvious.
|
||||
|
||||
```python
|
||||
"""Scoutline — Fetcher + Signalizer for URLhaus."""
|
||||
|
||||
from __future__ import annotations
|
||||
# ... imports ...
|
||||
|
||||
def fetch_recent(timeout: float = 30.0) -> str:
|
||||
...
|
||||
|
||||
def parse_urlhaus_csv(text: str) -> Iterable[Dict]:
|
||||
...
|
||||
```
|
||||
|
||||
**Forbidden:** function docstrings restating the signature (`"""Fetch the recent CSV. Returns: str."""` — useless).
|
||||
|
||||
---
|
||||
|
||||
## 10. Logging — `structlog` over stdlib `logging`, event names + key/value
|
||||
|
||||
```python
|
||||
import structlog
|
||||
|
||||
log = structlog.get_logger(__name__)
|
||||
|
||||
log.info("case.ingested", case_id=case.case_id, source="urlhaus", count=len(rows))
|
||||
log.warning("route.blocked", case_id=case.case_id, dest="VirusTotal", reason="tlp_red")
|
||||
log.error("submit.failed", case_id=case.case_id, dest="CERT-Bund", error=str(exc))
|
||||
```
|
||||
|
||||
**Event names:** `<area>.<action>` lowercase, dot-separated. **Never** interpolate values into the event name — they go in the key/value payload so the ledger and audit code can index on them.
|
||||
|
||||
Configuration lives in `psyc/log.py`, imported once at process start (CLI and cockpit entrypoints).
|
||||
|
||||
---
|
||||
|
||||
## 11. SQL — SQLAlchemy Core (Tables + `engine.connect()`), no ORM
|
||||
|
||||
```python
|
||||
from sqlalchemy import Table, Column, String, MetaData, create_engine, insert, select
|
||||
|
||||
engine = create_engine("sqlite:///data/psyc.db", future=True)
|
||||
meta = MetaData()
|
||||
|
||||
cases = Table(
|
||||
"cases", meta,
|
||||
Column("case_id", String, primary_key=True),
|
||||
Column("summary", String, nullable=False),
|
||||
Column("tlp", String, nullable=False),
|
||||
# ...
|
||||
)
|
||||
|
||||
with engine.begin() as conn:
|
||||
conn.execute(insert(cases).values(case_id=case.case_id, summary=case.summary, tlp=case.classification.tlp.value))
|
||||
|
||||
with engine.connect() as conn:
|
||||
row = conn.execute(select(cases).where(cases.c.case_id == case_id)).fetchone()
|
||||
```
|
||||
|
||||
**No ORM session, no `declarative_base`, no SQLModel.** SQL stays visible as expressions, but parameter binding and dialect handling are SQLAlchemy's job. `engine.begin()` for writes (auto-commit), `engine.connect()` for reads.
|
||||
|
||||
---
|
||||
|
||||
## 12. CLI — flat Typer commands, hyphenated names
|
||||
|
||||
```python
|
||||
import typer
|
||||
|
||||
app = typer.Typer(add_completion=False, help="psyc — defensive CTI routing & sealing")
|
||||
|
||||
@app.command("init")
|
||||
def init() -> None: ...
|
||||
|
||||
@app.command("fetch-urlhaus")
|
||||
def fetch_urlhaus(limit: int = 50) -> None: ...
|
||||
|
||||
@app.command("seal-pack")
|
||||
def seal_pack(case_id: str) -> None: ...
|
||||
|
||||
@app.command("route-plan")
|
||||
def route_plan(case_id: str) -> None: ...
|
||||
|
||||
@app.command("serve")
|
||||
def serve(host: str = "127.0.0.1", port: int = 8000) -> None: ...
|
||||
```
|
||||
|
||||
Invocation: `psyc fetch-urlhaus --limit 50`. No sub-apps, no nested namespaces. Command name = function name with underscores → hyphens.
|
||||
|
||||
---
|
||||
|
||||
## Tooling
|
||||
|
||||
```toml
|
||||
[tool.ruff]
|
||||
line-length = 120
|
||||
|
||||
[tool.ruff.lint]
|
||||
select = ["E", "F", "I", "UP", "B"]
|
||||
# UP rules that conflict with rule 1/2 are ignored:
|
||||
ignore = ["UP006", "UP007", "UP035"]
|
||||
|
||||
[tool.ruff.lint.isort]
|
||||
known-first-party = ["psyc"]
|
||||
```
|
||||
|
||||
`UP006` / `UP007` / `UP035` would auto-rewrite `List[X]`/`Optional[X]` to lowercase / pipe forms — disabled because **rules 1 and 2** outrank them.
|
||||
|
||||
---
|
||||
|
||||
## Out of scope
|
||||
|
||||
This guide intentionally **does not** cover:
|
||||
- async vs sync (decide per worker line; httpx clients sync inside Typer commands, async only if the cockpit demands it)
|
||||
- test framework (pytest is the default; no folding required)
|
||||
- exception class hierarchies (rule 5 minimizes their need; design per line as it arrives)
|
||||
- API response shapes for the cockpit (REST/HTML; JSON-only routes when stage 3+ ships)
|
||||
|
||||
Add new folds at the bottom of this file as they come up. Don't retrofit the guide silently — each addition is a recorded decision.
|
||||
Reference in New Issue
Block a user