finetuning-plattform-setup-…/MANUAL.md

# Neuronetz Finetuning Platform — Developer Manual

> **Repository:** `ssh://git@gitea.neuronetz.ai:222/Neuronetz/finetuning-plattform.git`
> **This delta:** `ssh://git@gitea.neuronetz.ai:222/m17hr1l/finetuning-plattform-setup-delta.git`
> **Generated from:** `develop` @ `70b203c`, 2026-05-14
> **Audience:** Senior backend and frontend developers joining the project.

---

## Table of Contents

### Part I — Get Running
1. [Quick Start (10 min)](#1-quick-start-10-min)
2. [Full Setup](#2-full-setup)
3. [First Login & Sanity Checks](#3-first-login--sanity-checks)
4. [Troubleshooting Setup](#4-troubleshooting-setup)

### Part II — The Platform
5. [What This Platform Does](#5-what-this-platform-does)
6. [Architecture Overview](#6-architecture-overview)
7. [The Service Stack](#7-the-service-stack)
8. [The Nibiru Framework](#8-the-nibiru-framework)
9. [Module-Model-View-Controller (MMVC)](#9-module-model-view-controller-mmvc)
10. [Database](#10-database)
11. [API Surface](#11-api-surface)
12. [Authentication & Sessions](#12-authentication--sessions)
13. [Background Work](#13-background-work)
14. [Model Serving & Inference](#14-model-serving--inference)
15. [Frontend](#15-frontend)

### Part III — Working in the Codebase
16. [Hard Rules](#16-hard-rules)
17. [Git Workflow](#17-git-workflow)
18. [Code Style](#18-code-style)
19. [Naming Conventions](#19-naming-conventions)
20. [Testing](#20-testing)
21. [Common Gotchas](#21-common-gotchas)

### Part IV — Operations
22. [Tools & Dashboards](#22-tools--dashboards)
23. [Logging & Observability](#23-logging--observability)
24. [Troubleshooting Runbooks](#24-troubleshooting-runbooks)
25. [Regenerating This Delta](#25-regenerating-this-delta)

### Part V — Context & Culture
26. [Regulatory Stance (Why No GDPR Theater)](#26-regulatory-stance-why-no-gdpr-theater)
27. [The Multi-Agent Orchestrator (Optional)](#27-the-multi-agent-orchestrator-optional)
28. [Glossary](#28-glossary)

---

# Part I — Get Running

## 1. Quick Start (10 min)

For experienced devs who want the platform running and `git log` open in 10 minutes:

```bash
# 1) Add to /etc/hosts (one line, all subdomains)
echo "127.0.0.1   local.finetune.neuronetz.ai local.websocket.finetune.neuronetz.ai local.redis-commander.neuronetz.ai local.graylog.finetune.neuronetz.ai local.kibana.finetune.neuronetz.ai" | sudo tee -a /etc/hosts

# 2) Clone both repos as siblings
mkdir -p ~/projects && cd ~/projects
git clone ssh://git@gitea.neuronetz.ai:222/Neuronetz/finetuning-plattform.git
git clone ssh://git@gitea.neuronetz.ai:222/m17hr1l/finetuning-plattform-setup-delta.git

cd finetuning-plattform && git checkout develop

# 3) Start the stack
cd local && docker compose up -d
cd ../..

# 4) Bootstrap the DB
cd finetuning-plattform-setup-delta
./bootstrap-db.sh

# 5) Done — open the platform
xdg-open http://local.finetune.neuronetz.ai/auth/login
# login: admin@finetune.ai / admin123
```

If anything errors, jump to [§4 Troubleshooting Setup](#4-troubleshooting-setup) or [§21 Common Gotchas](#21-common-gotchas).

---

## 2. Full Setup

### 2.1 Prerequisites (on your host)

- **Docker** ≥ 24 with the **compose plugin** v2.20+. Verify: `docker compose version` must succeed (the legacy `docker-compose` script will NOT work — the platform's scripts assume the plugin form).
- **Git** with SSH access to `gitea.neuronetz.ai`. Stephan adds your public key to the `Neuronetz` org.
- **NVIDIA GPU + driver + nvidia-container-toolkit** if you want to actually run inference (Ollama or llama-server). Without one, set `OLLAMA_GPU_LAYERS=0` in `local/.env` and use CPU inference (slow but functional).
- **~30 GB free disk** for images + model downloads.
- Linux or macOS. Windows works only via WSL2 + Docker Desktop.

### 2.2 SSH key

```bash
ssh-keygen -t ed25519 -C "your@email.example"   # if you don't have one
cat ~/.ssh/id_ed25519.pub                       # send this to Stephan
```

Test:
```bash
ssh -p 222 -T git@gitea.neuronetz.ai
# expected: "Hi <username>! You've successfully authenticated, ..."
```

### 2.3 Hosts file

The platform uses subdomains routed through nginx-proxy. Add this block to `/etc/hosts`:

```
127.0.0.1   local.finetune.neuronetz.ai
127.0.0.1   local.websocket.finetune.neuronetz.ai
127.0.0.1   local.redis-commander.neuronetz.ai
127.0.0.1   local.graylog.finetune.neuronetz.ai
127.0.0.1   local.kibana.finetune.neuronetz.ai
```

If you'll touch the multi-agent orchestrator (rare, see [§27](#27-the-multi-agent-orchestrator-optional)):

```
127.0.0.1   backend.finetune.neuronetz.ai
127.0.0.1   uiux.finetune.neuronetz.ai
127.0.0.1   qa.finetune.neuronetz.ai
127.0.0.1   devops.finetune.neuronetz.ai
127.0.0.1   control.finetune.neuronetz.ai
127.0.0.1   fullstack.finetune.neuronetz.ai
```

### 2.4 Clone

```bash
mkdir -p ~/PhpstormProjects && cd ~/PhpstormProjects
git clone ssh://git@gitea.neuronetz.ai:222/Neuronetz/finetuning-plattform.git
git clone ssh://git@gitea.neuronetz.ai:222/m17hr1l/finetuning-plattform-setup-delta.git
```

The two repos must be siblings — `bootstrap-db.sh` finds the platform repo by relative path `../finetuning-plattform`. You can pass an explicit path if your layout differs:

```bash
./bootstrap-db.sh /custom/path/to/finetuning-plattform
```

### 2.5 Branch

```bash
cd finetuning-plattform
git checkout develop
git pull
```

The platform uses **gitflow**: `develop` is the live integration branch, `main` is for releases. Always start feature work from `develop`. Never merge directly into `main`.

### 2.6 Environment file

`local/.env` is committed for local-dev defaults — open and skim it. Key vars:

| Var | Purpose | Default |
|---|---|---|
| `MARIADB_USER` / `MARIADB_PASSWORD` | DB credentials | committed |
| `OLLAMA_API_URL` | Internal Ollama endpoint | `http://ollama:11434` |
| `HUGGINGFACE_API_URL` | HF Hub base | `https://huggingface.co` |
| `HUGGINGFACE_API_TOKEN` | Your personal HF token | **empty by default** — set if you need gated models (Llama, etc.) |
| `WORKER_*` | Job queue tunables | sensible defaults |
| `NFP_LLAMA_GPU_LAYERS` | GPU offload for C++ inference server | `99` (= all layers on GPU; reduce if you OOM) |

If you set a real HF token, **do not commit it**. `.gitignore` covers `.claude/credentials.json` and `.claude/youtrack-credentials.json` — extend it if you add other secret files.

### 2.7 Start the stack

```bash
cd local
docker compose up -d
```

First run pulls ~10 GB of images and takes 5–10 minutes. Subsequent starts are seconds.

Verify:
```bash
docker compose ps
```

Expect ~14 services running. The `llama-server` service is currently disabled via `profiles: [disabled]` — that's intentional, see [§14](#14-model-serving--inference).

### 2.8 Bootstrap the database

From this delta repo:
```bash
cd ~/PhpstormProjects/finetuning-plattform-setup-delta
./bootstrap-db.sh
```

The script:
1. Sources `local/.env` from the platform repo to get DB credentials.
2. Waits for MariaDB to accept connections.
3. Loads schema, seed, default users in order.
4. Prints the login credentials.

If you prefer the migration path (slower, but matches how the platform self-migrates in CI/production):

```bash
cd ~/PhpstormProjects/finetuning-plattform
docker compose -f local/docker-compose.yml exec fpm ./nibiru -mi local
```

This runs every `app/src/application/settings/config/database/NNN-*.sql` in numerical order. Use it after pulling new migrations from develop.

---

## 3. First Login & Sanity Checks

```bash
curl -sS -o /dev/null -w "%{http_code}\n" http://local.finetune.neuronetz.ai/
# 200
```

Open `http://local.finetune.neuronetz.ai/auth/login`.

| Role | Login | Password |
|---|---|---|
| Superuser | `admin@finetune.ai` | `admin123` |
| Regular user | `testuser@example.com` | `test123` |

**These are dev-only credentials.** Change them on any deployment that isn't your laptop.

### Sanity checklist

- `/dashboard` loads → ✓
- `/models` → Pulled Models tab lists the 9-ish Ollama models the shared instance has → ✓
- `/admin` (as admin) → billing dashboard, no 500 → ✓ (this 500'd yesterday, [§24](#24-troubleshooting-runbooks) explains)
- `/datasets` → ✓
- `/jobs` → ✓
- `/chat` → can chat with any chat-capable model from the list → ✓

If any of these 500, check `docker compose logs fpm | tail -50`.

---

## 4. Troubleshooting Setup

| Symptom | Cause | Fix |
|---|---|---|
| `docker compose: command not found` | Legacy `docker-compose` script installed, not the plugin | Install Docker Engine ≥ 24 with `docker-compose-plugin` |
| `Permission denied` cloning | Your SSH key isn't on Gitea | Ask Stephan to add the public key |
| Stack starts but `fpm` is unhealthy | Usually means `.env` is missing a var | `docker compose logs fpm | tail -50` shows what's missing |
| `bootstrap-db.sh` says "MariaDB not running" | Container takes 10–30s to be ready after `up -d` | Wait longer, then re-run |
| Platform returns 502 from nginx | `fpm` crashed | `docker compose restart fpm`; if it crashes again, capture logs and grep for `Fatal error` |
| `llama-server` fails to build | Known: NFP-51 vs current llama.cpp API | Ignore — the service is profile-disabled. Other services work without it. |
| `composer install` fails with PHP 8.3 platform check error | Stale `composer.lock` (NFP-22 work in progress) | Use the bundled `phpunit-11.phar` for tests; ignore composer until the lock is regenerated. |

---

# Part II — The Platform

## 5. What This Platform Does

The Neuronetz Finetuning Platform is a self-hosted alternative to OpenAI fine-tuning APIs. End users:

1. **Upload datasets** (JSONL with prompt/completion or instruction format)
2. **Pick a base model** (anything on HuggingFace as GGUF, or already pulled into the local Ollama)
3. **Configure a training job** (epochs, batch size, LoRA rank, learning rate — or use presets like `quick` / `standard` / `thorough`)
4. **Run the job** — the platform spawns a training container, streams progress via WebSocket
5. **Test the result** — chat with the fine-tuned model in-browser
6. **Deploy** — serve the model on an Ollama instance accessible from a stable URL (`api.neuronetz.ai` in production)

The platform also includes user management, billing/credits, REST API for headless usage, an admin dashboard, and a multilingual UI (EN, DE, ES, FR, IT, JA, NL, PL, PT).

Eventually (NFP-51 + NFP-52) the inference engine moves from Ollama → a custom C++ llama.cpp server with multi-model handling. That's Stephan's in-flight work; you'll inherit it.

## 6. Architecture Overview

```
                        ┌───────────────────────────────────────────────┐
                        │                  USER (browser)                │
                        └───────────┬────────────────────┬──────────────┘
                                    │ HTTPS              │ WSS
                                    ▼                    ▼
                            ┌───────────────┐    ┌──────────────────┐
                            │  nginx-proxy  │    │ websocket server │
                            └───────┬───────┘    │ (Workerman)      │
                                    │            └──────────────────┘
                                    ▼                    ▲
                          ┌──────────────────┐           │
                          │   php-fpm 8.3    │───────────┘ progress msgs
                          │ Nibiru Framework │
                          └─┬──┬──┬──┬──┬──┬─┘
                            │  │  │  │  │  │
                       ┌────┘  │  │  │  │  └────┐
                       ▼       ▼  │  │  ▼       ▼
                  MariaDB   Redis │  │ Memcached  ES
                                  │  │
                                  │  └──→ container-manager (Python, Docker socket)
                                  │            │
                                  │            └──→ docker daemon
                                  │                     │
                                  └──→ HTTP /api/*      ▼
                                                  Ollama containers
                                                  (shared + per-user)
                                                  Training containers
                                                  (on-demand, GPU)

                        ┌───────────────────────────────────────────────┐
                        │                  job-worker                    │
                        │  (Workerman, polls DB for queued jobs,         │
                        │  dispatches via container-manager,             │
                        │  monitors progress, streams to websocket)      │
                        └───────────────────────────────────────────────┘
```

Two principles drive the design:

1. **PHP never calls Docker directly.** The fpm container has no `docker` CLI by design. All container operations route through the `container-manager` microservice (Python) which has the Docker socket mounted. This is the most important invariant — violating it produces silent failures or 500s.
2. **The Nibiru framework owns the dispatch.** Controllers are thin; modules are fat. Routes are configured in INI files; the framework handles all the wiring. Don't fight it.

## 7. The Service Stack

Every service in `local/docker-compose.yml`:

| Service | Image / Build | Port | Purpose |
|---|---|---|---|
| `nginx` | `nginx:alpine` | 80, 443 | Reverse proxy, routes subdomains to fpm / websocket / etc. |
| `fpm` | custom (PHP 8.3.9 + extensions) | 9000 (internal) | Application server. Runs Nibiru. |
| `mariadb` | `mariadb:10.11` | 3306 | Primary data store |
| `redis` | `redis:7-alpine` | 6379 | Sessions, queue, cache |
| `redis-commander` | `rediscommander/redis-commander` | 8081 | Redis UI at `local.redis-commander.neuronetz.ai` |
| `memcached` | `memcached:1.6` | 11211 | Secondary cache (some legacy plugins use it) |
| `elasticsearch` | `elasticsearch:7.10.2` | 9200 | Search backend |
| `kibana` | `kibana:7.10.2` | 5601 | ES log visualization at `local.kibana.finetune.neuronetz.ai` |
| `mongo` | `mongo:6` | 27017 | Graylog backend |
| `graylog` | `graylog/graylog:5.0` | 9000 | Centralized log aggregation at `local.graylog.finetune.neuronetz.ai` |
| `container-manager` | custom (Python + Flask) | 8080 | Docker socket proxy used by PHP |
| `websocket` | custom (Workerman) | 2346 | Real-time progress streaming |
| `job-worker` | custom (PHP 8.2 CLI + Workerman) | — | Background job dispatcher |
| `ollama` | `ollama/ollama:latest` | 11434 (internal) | Local LLM inference |
| `llama-server` | custom (C++ + CUDA) | 8090 | **PROFILE-DISABLED** — broken, see NFP-51 |

All services join the `ai-network` Docker network so they can address each other by service name (e.g. PHP talks to `http://container-manager:8080`).

## 8. The Nibiru Framework

Nibiru is Stephan Kasdorf's PHP MMVC framework. He built it over 8 years; it's the stable base for several products (this platform, others under `~/PhpstormProjects/tpms-*`, `~/PhpstormProjects/Nibiru-Agent`). The platform code lives in `app/src/application/`; the framework lives in `app/src/core/`.

### 8.1 What you need to know about the framework

- **It's FROZEN.** `app/src/core/` is off-limits to all contributors. Any PR touching it is auto-rejected. The only exception was NFP-18 (argon2id), explicitly approved by Stephan, and that was the **last** exception. A core refactor ticket (NFP-55) tracks moving auth logic into pluggable strategies so future work doesn't need core changes.
- **It has two autoloaders, both running:**
  - `core/c/auto.php` — Nibiru's native class autoloader, loads framework + module classes via a module registry
  - `core/l/autoload.php` — Composer PSR-4 autoloader, loads third-party packages from `core/l/` (the composer `vendor-dir`)
  - Both are bootstrapped automatically. **Never create a separate `vendor/autoload.php`** anywhere in the project. If a new PHP process (worker, cron) needs PHP packages, use the framework's existing autoloader.
- **The CLI is `./nibiru`** (a compiled binary at the project root inside the container). Common commands:
  ```bash
  docker compose exec fpm ./nibiru -c <name>    # scaffold a new controller + template
  docker compose exec fpm ./nibiru -m <name>    # scaffold a new module (4 files)
  docker compose exec fpm ./nibiru -mi local    # run pending migrations
  docker compose exec fpm ./nibiru -cache-clear # clear Smarty + framework caches
  docker compose exec fpm ./nibiru -model-rebuild # regenerate model files from DB
  ```
- **Configuration is INI-based.** `app/src/application/settings/config/settings.local.ini` (and env-specific variants) contain everything: routes, module registration, autoloader paths, database connection, security keys.

### 8.2 The module registry — gotcha

The framework's registry (`core/c/registry.php`) scans every module directory recursively looking for INI configuration files. It uses **string matching** (`strstr($path, 'settings')`) to find them. This means:

> **Any file path under `application/module/*/` that contains the literal string `settings` will be passed to `parse_ini_file()`.**

If it's actually an INI file: fine. If it's a PHP file with "settings" in its name (e.g., `settingsForm.php`): the INI parser tries to parse PHP syntax, hits an unexpected token, and **the entire platform returns HTTP 500**.

This bit us hard recently. **Never use the word "settings" in a filename under `application/module/`.** Use "prefs", "config", "options", "account" — anything else.

## 9. Module-Model-View-Controller (MMVC)

Standard Nibiru module structure:

```
app/src/application/module/<name>/
├── <name>.php                  # main module class, extends Module
├── interfaces/
│   └── <name>.php              # interface defining the public contract
├── plugins/
│   ├── <name>.php              # primary plugin — most of the logic lives here
│   └── <OtherPlugin>.php       # additional plugins as needed
├── traits/
│   └── <name>Form.php          # form builders, controller-side helpers
└── settings/
    └── <name>.ini              # module config (database, behavior)
```

**Controllers stay thin.** They authenticate, validate input, call the module, assign view data, render. All real logic lives in module plugins.

Example: `app/src/application/module/finetune/` contains:
- `plugins/Finetune.php` — central DB-access plugin, all CRUD for jobs/datasets/models
- `plugins/Ollama.php` — Ollama HTTP client + deployment
- `plugins/ContainerManager.php` — talks to the container-manager microservice
- `plugins/TrainingManager.php` — orchestrates training jobs end-to-end
- `plugins/TestRunner.php` — manages chat-test sessions against the shared Ollama
- `plugins/HuggingFace.php` — HF Hub API client, GGUF download, model import
- `traits/apiKeyForm.php` — form builder for the API keys page

Pattern for adding a new feature:
1. `./nibiru -m newfeature` — scaffolds the 4-file module
2. **Register all 4 files in `settings.local.ini`** under `[AUTOLOADER]`. If you forget, the class isn't loaded.
3. Implement plugin logic
4. Add a controller (`./nibiru -c newfeature`)
5. Wire the route in `settings.local.ini` under `route[...]`
6. Build the template, link from navigation

## 10. Database

### 10.1 Engine choice

**MariaDB 10.11, not MySQL, not PostgreSQL.** Syntax differs in subtle ways — JSON functions, window functions, `EXPLAIN` output. Test your queries inside the container:

```bash
docker compose -f local/docker-compose.yml exec mariadb \
    mariadb -u neuronetz -p"$MARIADB_PASSWORD" neuronetz_finetune
```

### 10.2 Column naming convention

**Always `tablename_fieldname`:**

```
jobs_id, jobs_user_id, jobs_status, jobs_created_at
datasets_id, datasets_name, datasets_file_path
user_id, user_login, user_email, user_password_hash
```

This is not aesthetic preference — Nibiru's auto-model generator (`./nibiru -model-rebuild`) and the module registry depend on this exact format. Migrations that rename columns **must also update the corresponding model file's `const TABLE` array** in `app/src/application/model/NeuronetzFinetune/<name>.php`.

### 10.3 Migrations

Location: `app/src/application/settings/config/database/NNN-<description>.sql`

Numbered sequentially. Never edit an applied migration — always add a new numbered file. Tracked in the `migrations` table.

Apply manually:
```bash
docker compose exec fpm ./nibiru -mi local
```

Re-apply a specific file:
```bash
docker compose exec fpm ./nibiru -mi-reset-file 032-handoff.sql local
```

### 10.4 Foreign key renames

If a migration renames a column that has a foreign key:

1. **DROP** the foreign key first
2. **RENAME** the column
3. **RECREATE** the foreign key with the new column name

This was the lesson from NFP-27. Skipping step 1 fails with a generic "cannot rename column" error.

### 10.5 Schema overview (30 tables, abridged)

| Group | Tables |
|---|---|
| Identity & auth | `user`, `acl`, `user_to_acl`, `account`, `user_to_account` |
| API & integration | `api_keys`, `api_registry`, `account_to_api_registry` |
| Settings | `user_settings`, `user_billing` |
| Core domain | `jobs`, `datasets`, `models`, `model_pulls` |
| Usage tracking | `usage_logs` |
| Agent system | `agent_templates`, `handoff_requests`, `handoff_messages`, `handoff_triggers` |
| Email | `email_templates`, `email_queue`, `email_notification_preferences` |
| Blog (NFP-9) | `blog_posts`, `blog_categories` |
| Time | `timeanddate`, `timeanddate_to_user`, `timeanddate_to_account` |
| Migrations | `migrations` |

Full schema in `db/01-schema.sql`.

## 11. API Surface

The REST API is OpenAI-compatible-ish. Auth via session cookie (web) or Bearer token (`api_keys` table).

### Web routes (HTML)
```
/                          landing page
/auth/login                login
/auth/register             registration
/auth/logout               logout
/dashboard                 user dashboard
/jobs, /jobs/create, /jobs/view
/datasets, /datasets/upload
/models                    pulled models + user's trained models
/agenttemplates            pre-built agent templates
/apikeys
/usage, /pricing
/settings                  user prefs (note: NOT a Nibiru module — see §8.2)
/support
/docs                      API docs page
/admin                     admin-only, billing dashboard
/chat                      chat test interface
```

### JSON API
```
GET    /api/pricing              public pricing tiers
GET    /api/auth                 auth status (requires session)
GET    /api/jobs                 list user's jobs
POST   /api/jobs                 create a job
GET    /api/datasets             list datasets
POST   /api/datasets             upload dataset
GET    /api/models               user's trained models (DB only)
POST   /api/models?subaction=deploy   deploy a model
GET    /api/usage                usage stats
GET    /api/apikeys
POST   /api/apikeys              create + revoke
GET    /api/wstoken              issue a websocket auth token
POST   /api/training/start       start a training job (also via job-worker)
POST   /api/ollama?subaction=list-pulled       list models on shared Ollama
POST   /api/ollama?subaction=chat-pulled       chat with a pulled model
POST   /api/ollama?subaction=pull              pull a new model from registry
POST   /api/ollama?subaction=hf-search         search HF Hub
POST   /api/ollama?subaction=hf-download       download GGUF from HF
POST   /api/ollama?subaction=hf-import         register downloaded GGUF with Ollama
POST   /api/test?subaction=start               start a chat test session
POST   /api/test?subaction=chat                send chat message in session
POST   /api/test?subaction=stop                end session, unload model
```

Public production API (when deployed) sits at `https://api.neuronetz.ai/v1/*` — OpenAI-compatible chat completions endpoint, served by the production Ollama (or eventually the C++ inference server). The platform displays code snippets to users using this URL on the model detail page.

## 12. Authentication & Sessions

### 12.1 Password storage

As of NFP-18 (merged 2026-04-13), passwords use **argon2id** via PHP's `password_hash(PASSWORD_ARGON2ID)`. The `user.user_password_hash` column holds the modern hash.

For migration from the legacy AES-encrypted column (`user.user_pass`), the auth flow tries argon2id first. If that fails and a legacy AES password matches, the user is transparently rehashed into argon2id on that login. Eventually `user_pass` can be dropped — schedule for a release after most users have logged in once.

NFP-55 tracks refactoring `core/c/auth.php` to make password strategies pluggable so future auth work (2FA, passkeys, etc.) doesn't require touching core.

### 12.2 Session storage

Sessions are stored in **Redis** (`redis:6379`). Session ID is the standard PHP session cookie. ACL role is loaded from `user_to_acl` on login and cached in `$_SESSION['auth']`:

```php
$_SESSION['auth'] = [
    'user_id' => 1,
    'user_login' => 'admin@finetune.ai',
    'user_email' => 'admin@finetune.ai',
    'user_name' => 'Admin',
    'role' => 'superuser',     // from acl.acl_role
];
```

### 12.3 CSRF

CSRF tokens are mandatory on all POST forms (NFP-19). The CSRF module generates per-session tokens; forms include them as hidden fields; the controller validates before processing.

The framework's form factory (`Nibiru\Factory\Form`) automatically injects CSRF tokens. If you build a form by hand, include the token manually:

```smarty
<input type="hidden" name="csrf_token" value="{$csrf_token}">
```

### 12.4 API key auth

API endpoints accept `Authorization: Bearer <key>` as alternative to session auth. Keys are stored as bcrypt hashes in `api_keys` (per-user, revokable, with scopes).

## 13. Background Work

### 13.1 job-worker

A Workerman-based PHP CLI process (separate container, PHP 8.2-alpine) that polls the `jobs` table every 5 seconds, picks queued training jobs, dispatches them via `ContainerManager::createContainer()` for the actual training run, and monitors progress.

The worker has **no direct DB models** — it runs in its own container and uses raw `PDO` queries. It does NOT bootstrap the Nibiru framework (would be overkill for the worker's narrow job). But it does use the framework's `composer` vendor directory via the WebSocket server's autoloader at `app/src/application/server/loader/vendor/autoload.php`.

If you need to debug:
```bash
docker compose logs job-worker -f
```

### 13.2 WebSocket server

Workerman process on TCP 2346, served behind nginx at `local.websocket.finetune.neuronetz.ai`. Used for:
- Job progress streaming (NFP-13)
- Future: real-time chat, collaborative editing

Auth: the platform issues short-lived (5-minute) tokens via `/api/wstoken`; the client connects with `?token=...`. The WS server validates against the database.

### 13.3 container-manager

Python (Flask) microservice with the host Docker socket mounted. Exposes a small HTTP API on `:8080`:

```
GET    /health
GET    /api/gpu                  GPU info
GET    /api/containers           list managed containers
GET    /api/containers/{userId}
POST   /api/containers/{userId}              create+start a user's Ollama
POST   /api/containers/{userId}/start
POST   /api/containers/{userId}/stop
DELETE /api/containers/{userId}
POST   /api/containers/{userId}/exec         { command: "ollama list" }
POST   /api/containers/{userId}/write        { content: "...", path: "/Modelfile" }
GET    /api/containers/{userId}/logs?lines=100
```

The PHP client is `Ollama\ContainerManager`. **All container operations from PHP go through it.** Never `exec("docker ...")` from PHP.

## 14. Model Serving & Inference

### 14.1 Current: Ollama (`ollama` service)

The shared Ollama instance (`finetune-ollama-shared`) runs at `http://finetune-ollama-shared:11434` inside the Docker network. It currently hosts the user's trained models plus any models pulled from HF. Two concurrent models max (`OLLAMA_MAX_LOADED_MODELS=2`).

For chat testing, the platform's `TestRunner` plugin:
1. Picks an endpoint — shared Ollama if loaded models < 2, otherwise spins up an overflow per-user container via ContainerManager.
2. Loads the model if not already registered (via `ollama pull` for HF models, or `ollama create -f Modelfile` for platform-trained GGUFs once NFP-52 ships).
3. Routes chat HTTP to that endpoint's `/api/chat`.
4. Unloads when the session ends.

A model is "chat-capable" if its Ollama metadata reports `completion` capability AND its template contains the `.Messages` marker. Pure-completion templates like `{{ .Prompt }}` are not chat-capable; the UI filters their chat buttons (commit `1947e34`).

### 14.2 Next: C++ llama.cpp server (NFP-51, in progress)

Stephan is building a multi-model C++ inference server using `llama.cpp` directly, exposed via `cpp-httplib`. Goals:
- Replace Ollama (third-party Go binary) with code we fully control
- Handle multi-model inference natively (no per-model container lifecycle)
- Eliminate Docker from the inference path entirely
- Open-source-friendly, GDPR-friendly (no upstream telemetry concerns)

Status: scaffold landed (`local/llama-server/`) but currently uncompilable on develop because `main.cpp` references the removed `llama_batch_add` helper from older llama.cpp. The service is disabled via `profiles: [disabled]` in docker-compose. Stephan is writing the working version himself.

### 14.3 HuggingFace integration

The `HuggingFace` plugin handles:
- Search Hub for models (`/api/models?search=`)
- List files in a repo (`/api/models/{repo}/tree/main`)
- Download specific files (typically GGUFs) to `app/src/downloads/huggingface/`
- Async downloads with progress reporting via a separate script that writes to a temp file

Honors `HUGGINGFACE_API_TOKEN` env var for authenticated downloads (required for gated models like Meta Llama).

## 15. Frontend

### 15.1 Templates

**Smarty 3.1** at `app/src/application/view/templates/`. Shared layout fragments under `templates/shared/v5/` (header, footer, navbar, sidebar).

**No inline JS or CSS.** This is enforced by QA — any inline `<script>` or `<style>` block in a template is an auto-reject. Use external files under `app/src/public/{js,css}/v5/`.

### 15.2 CSS

**Bootstrap 5.3** loaded from CDN in `header.tpl`. Custom CSS in `app/src/public/css/v5/` per feature (e.g. `admin-billing.css`, `chat.css`, `agent-templates.css`).

### 15.3 JavaScript

**Vanilla JS + HTMX** for interactivity. Custom helpers in `app/src/public/js/v5/`:
- `finetune-api.js` — central API client (handles auth, CSRF, fetch)
- `theme-switcher.js` — dark/light mode
- `onboarding.js` — first-time user tour
- Feature-specific JS per module

HTMX is loaded globally. Use it for partial page updates, modal loads, dynamic dropdowns. No React, no Vue, no SPA framework. The platform is intentionally classic.

### 15.4 Forms

Use the framework's `Nibiru\Factory\Form` builder:

```php
use Nibiru\Factory\Form;

Form::create('myForm');
Form::addInputTypeText('name', $value, ['placeholder' => 'Name', 'required' => true]);
Form::addInputTypeEmail('email', $value);
Form::addSelectOption(['Option 1', 'value1']);
Form::addSelectOption(['Option 2', 'value2']);
Form::addSelect('country', ['id' => 'country-select']);
Form::addTypeButton('Submit');
$html = Form::addForm(['action' => '/my-endpoint', 'method' => 'POST']);
```

**Known framework bug:** `addOpenAny`/`addCloseAny` are broken (missing `FORM_ATTRIBUTE_ROLE` constant). Use `addOpenDiv` with raw HTML in `value` as a workaround until the core refactor.

### 15.5 i18n

Translation files in `app/src/application/settings/lang/{en,de,es,fr,it,ja,nl,pl,pt}.json`. Loaded by the `I18n` module. Pass to templates as `$t`:

```smarty
{$t.settings.title|default:'Settings'}
```

Default-to-English fallback in case a translation is missing.

---

# Part III — Working in the Codebase

## 16. Hard Rules

Read these once. Internalize them. They will save you from re-doing work.

| # | Rule | Why |
|---|---|---|
| 1 | **`app/src/core/` is FROZEN.** No edits, no exceptions. | The Nibiru Framework is 8+ years of stable base code. Core changes destabilize every project that uses it. NFP-18 (argon2id) was the final exception, explicitly approved. |
| 2 | **Use `docker compose` (space), never `docker-compose` (hyphen).** | The legacy script behaves differently and is explicitly blocked. |
| 3 | **MariaDB syntax only.** No PostgreSQL idioms. | Test queries against the actual DB before committing. |
| 4 | **No inline `<script>` or `<style>` in templates.** | QA auto-rejects. Style in `public/css/v5/`, JS in `public/js/v5/`. |
| 5 | **Column naming: `tablename_fieldname` always.** | The auto-model generator and registry depend on it. Migrations renaming columns must update model `const TABLE` arrays. |
| 6 | **No file with "settings" in its name under `application/module/`.** | The registry's INI parser will eat it and the platform 500s. Use "prefs", "config", "options". |
| 7 | **No `exec("docker ...")` calls from PHP.** | fpm has no docker CLI. Route through `ContainerManager::*` methods. |
| 8 | **No separate `vendor/autoload.php`** anywhere. | Use the framework's existing autoloader. Adding parallel autoloaders breaks classloading. |
| 9 | **Never modify the legacy `user_pass` column.** | It's the bridge for users not yet migrated to argon2id. Read it for fallback auth only. |
| 10 | **Branch off `develop`, PR to `develop`.** Never merge directly into `main`. | Gitflow. `main` is for release cuts. |

## 17. Git Workflow

### 17.1 Branch lifecycle

```bash
git checkout develop && git pull origin develop
git checkout -b NFP-<num>/<short-title-with-dashes>
# ... do work ...
git add <specific-files>          # never `git add -A` — picks up secrets and runtime junk
git commit -m "NFP-<num>: <description>"
git push -u origin NFP-<num>/<short-title-with-dashes>
# open PR in Gitea targeting develop
```

### 17.2 Commit messages

```
NFP-<num>: <imperative description>

[optional body with details, why-not-what, references to other tickets]
```

Examples from recent history:
- `NFP-1: Convert login form to Nibiru Factory\Form`
- `NFP-27: Add column renaming migrations for consistent naming`
- `NFP-50: Replace exec('docker') calls with ContainerManager in Ollama plugin`

### 17.3 Pull request

- **Title:** the commit subject. Keep under 70 chars.
- **Body:** what changed and why, test plan, screenshots if UI.
- **Base:** `develop`.
- **Reviewer:** QA agent (Marie) reviews automatically if the orchestrator is running. Otherwise tag Stephan.

### 17.4 Rebase, don't merge develop into your branch

If develop has moved since you branched:
```bash
git fetch origin
git rebase origin/develop
git push --force-with-lease       # only after rebase
```

## 18. Code Style

```php
<?php
namespace Nibiru\Module\Finetune\Plugins;

use Nibiru\Module\Finetune\Interfaces\Finetune as IFinetune;

class MyPlugin implements IFinetune
{
    private static ?MyPlugin $instance = null;

    private function __construct()
    {
        // …
    }

    public static function init(): MyPlugin
    {
        if (self::$instance === null)
        {
            self::$instance = new self();
        }
        return self::$instance;
    }

    public function doThing(int $userId, array $options = []): array
    {
        if ($options['fast'] ?? false)
        {           // opening brace on NEW line for if/foreach/while/etc
            return $this->fastPath($userId);
        } else {    // else on SAME line as closing brace
            return $this->slowPath($userId);
        }
    }
}
```

- **Type hints on every method signature** (params + return).
- **Opening brace on new line for control flow** (`if`, `foreach`, `while`, function declarations).
- **`} else {` and `} catch {` on the same line.**
- **Condition stays on one line.** Break long conditions across multiple lines with `&&` / `||` at the start of continuation.
- **No commented-out code in commits.** Delete it; git remembers.
- **Default to no comments.** Add one only when the WHY is non-obvious. Don't explain WHAT — names should do that.

## 19. Naming Conventions

### Files

- Controllers: `<name>Controller.php` (lowercase first letter even for class). E.g., `jobsController.php`.
- Module main: `<name>.php` (lowercase). E.g., `finetune.php`.
- Plugins: `<Name>.php` (PascalCase). E.g., `Ollama.php`, `ContainerManager.php`.
- Traits: `<name>Form.php` or `<name>Trait.php`. E.g., `authForm.php`, `apiKeyForm.php`.
- Migrations: `NNN-<description>.sql`. E.g., `029-handoff_requests.sql`.
- Templates: `<route>.tpl`. E.g., `dashboard.tpl`. Shared: under `templates/shared/v5/`.

### Database

- Tables: `lowercase_snake_case`. Singular for single-row tables (`user`, `account`), plural for collections (`jobs`, `datasets`).
- Columns: `tablename_fieldname` (e.g. `user.user_id`, `jobs.jobs_status`).
- Foreign keys: `<table>_<other_table>_id` (e.g. `jobs.jobs_user_id`).
- Indexes: `idx_<table>_<columns>`.
- Constraints: `fk_<table>_<other_table>`.

### Classes

- Modules: `Nibiru\Module\<Name>\<Name>` for the main, plugins under `Nibiru\Module\<Name>\Plugins\<Plugin>`.
- Models: `Nibiru\Model\NeuronetzFinetune\<Table>`.
- Controllers: `Nibiru\<name>Controller` (note: no module namespace).

### Routes

In `settings.local.ini`:
```
route[my-feature]            = "/my-feature"
route[my-feature/create]     = "/my-feature/create"
route[api/my-feature]        = "/api/my-feature"
```

Maps to `myfeatureController::pageAction()`, `myfeatureController::createAction()`, etc. The framework strips `-` and lowercases when locating the controller.

## 20. Testing

### 20.1 Stack

**PHPUnit 11.** Tests under `app/src/tests/`:
- `tests/Unit/` — pure unit tests, no DB
- `tests/Integration/` — DB-backed, currently skipped because the test bootstrap doesn't initialize the Nibiru DB connection (separate ticket)
- `tests/Fixtures/DatabaseSeeder.php` — fixture data for integration tests

### 20.2 Run

```bash
docker compose exec fpm vendor/bin/phpunit
# or the bundled phar if composer.lock is unresolved:
docker compose exec fpm ./phpunit-11.phar
```

Expected baseline (from 2026-04-13): **192 tests, 160 passing, 0 failing, 32 skipped.** The skips are all in `FinetunePluginIntegrationTest` waiting on the bootstrap fix.

### 20.3 Write tests

For new modules, add `tests/Unit/Module/<Module>/<Plugin>Test.php`. Pattern:

```php
<?php
declare(strict_types=1);
namespace Tests\Unit\Module\MyModule;

use PHPUnit\Framework\TestCase;
use PHPUnit\Framework\Attributes\Test;

class MyPluginTest extends TestCase
{
    #[Test]
    public function it_does_the_thing(): void
    {
        $result = MyPlugin::init()->doThing(1, ['fast' => true]);
        $this->assertTrue($result['success']);
    }
}
```

### 20.4 Coverage philosophy

We don't enforce a coverage percentage. **Test what's hard to verify by clicking.** Pure functions, edge cases, regression fixes. Don't test framework code; don't test trivial getters.

## 21. Common Gotchas

| Symptom | Cause | Fix |
|---|---|---|
| Trait class not found | New file not in autoloader cache | `docker compose exec fpm composer dump-autoload` |
| Module not loading | Forgot to register all 4 files in `settings.local.ini` `[AUTOLOADER]` | Add `class.pos[]`, `iface.pos[]`, `trait.pos[]`, `class.plugin.pos[]` entries |
| 500 on every page after merge | Probably a file with "settings" in its name | grep `application/module/*/` for `settings` |
| `Failed to load model` in chat | Model has no chat template (completion-only) | UI should filter; if you see it, the filter is broken |
| `docker: not found` | Someone called `exec("docker ...")` from PHP | Replace with ContainerManager method |
| `mkdir(): Permission denied` | Named docker volume created as root | `docker exec -u root <container> chown -R www-data:www-data <path>` |
| `column does not exist` | Forgot to update model `const TABLE` after rename | Update the model file, run `composer dump-autoload` |
| PR #N is `mergeable: false` | Conflicts with develop | Rebase your branch onto develop, force-push |
| Login form posts but returns 200 (login page again) | CSRF token missing or mismatched | Inspect the form HTML for `name="csrf_token"`, check session is being created |

## 22. Tools & Dashboards

| Tool | URL | Purpose |
|---|---|---|
| Platform | `http://local.finetune.neuronetz.ai/` | The app |
| Graylog | `http://local.graylog.finetune.neuronetz.ai/` | Centralized log search |
| Kibana | `http://local.kibana.finetune.neuronetz.ai/` | ES log visualization |
| Redis Commander | `http://local.redis-commander.neuronetz.ai/` | Redis inspector |
| Ollama API | `http://finetune-ollama-shared:11434/api/tags` (internal only) | List loaded models |
| Container Manager | `http://container-manager:8080/health` (internal only) | Docker proxy used by PHP |
| Gitea | `https://gitea.neuronetz.ai/Neuronetz/finetuning-plattform` | Source, PRs |
| YouTrack | `https://yt.neuronetz.ai/projects/NFP` | Tickets, board |

# Part IV — Operations

## 23. Logging & Observability

### 23.1 Structured logging via Graylog

The `graylog` module exposes traits any plugin can use:

```php
use Nibiru\Module\Graylog\Traits\Log;

class MyPlugin
{
    use Log;

    public function doThing()
    {
        $this->logInfo('Did the thing', ['user_id' => 42, 'duration_ms' => 123]);
        $this->logWarning('Something off', ['detail' => 'x']);
        $this->logError('It broke', ['exception' => $e->getMessage()]);
    }
}
```

Logs ship to Graylog over GELF. Search at `local.graylog.finetune.neuronetz.ai/search`.

### 23.2 Application logs (Smarty errors, PHP warnings)

```bash
docker compose logs fpm -f
docker compose logs nginx -f
docker compose logs job-worker -f
```

PHP errors are at LOG level NOTICE/WARNING/ERROR by default.

### 23.3 Database queries

If you need to inspect what queries are running:

```sql
SET GLOBAL general_log = 'ON';
SET GLOBAL general_log_file = '/var/lib/mysql/general.log';
-- ... do the thing ...
-- then:
SET GLOBAL general_log = 'OFF';
```

Then `docker exec -it neuro-finetuning-platform-mariadb-1 cat /var/lib/mysql/general.log`. **Don't leave it on** — it's a perf hit and a privacy concern.

## 24. Troubleshooting Runbooks

### 24.1 Platform returns 500 on every page

Most common cause: a PHP file with "settings" in its name under `application/module/`, being eaten by the registry's INI parser.

```bash
docker compose logs fpm | tail -50 | grep -A 5 "syntax error"
# look for: "syntax error, unexpected '...' in /var/www/html/application/module/.../settings*.php"
```

Fix: rename the file.

### 24.2 `/admin` returns 500 after merge

Likely a stale column reference. The `usage_logs_type` → `usage_logs_resource_type` rename in NFP-27 bit us; check for any code that references columns by their pre-rename name.

```bash
grep -rn "usage_logs_type" app/src/application/
```

Replace with the current column name; commit; deploy.

### 24.3 Stack went down mid-session

Usually someone (or an agent) ran `docker compose down`. To resume:

```bash
cd local
docker compose up -d
```

Wait ~30s, then verify with:
```bash
curl -sS -o /dev/null -w "%{http_code}\n" http://local.finetune.neuronetz.ai/
```

If anything fails to start, check the per-service logs.

### 24.4 Chat with a model fails silently

The model probably has no chat template (completion-only). Per the UI filter, only models with `template` containing `.Messages` should show a chat button. If you see one that shouldn't be there, the filter is broken (commit `1947e34` added it; regression means a recent merge undid the filter).

To check directly:
```bash
docker exec neuro-finetuning-platform-fpm-1 sh -c \
  'curl -sS http://finetune-ollama-shared:11434/api/show -d "{\"name\":\"<model-name>\"}" | jq .template'
```

Empty or `"{{ .Prompt }}"` → not chat-capable.

### 24.5 Composer install fails on PHP 8.3

```
laminas/laminas-diactoros requires php (~8.0.0 || ~8.1.0 || ~8.2.0) failed
```

The lock file has a stale constraint. Workaround:
```bash
docker compose exec fpm composer update laminas/laminas-diactoros --with-dependencies
docker compose exec fpm composer install
```

Proper fix is in flight (NFP-22 area).

## 25. Regenerating This Delta

When the schema drifts, regenerate from a live develop instance:

```bash
cd ~/PhpstormProjects/finetuning-plattform-setup-delta

# 1. Schema
docker exec neuro-finetuning-platform-mariadb-1 sh -c \
  'mariadb-dump -u neuronetz -p"$MARIADB_PASSWORD" --no-data --skip-add-drop-table --skip-comments neuronetz_finetune' \
  > db/01-schema.sql

# 2. Seed data
docker exec neuro-finetuning-platform-mariadb-1 sh -c \
  'mariadb-dump -u neuronetz -p"$MARIADB_PASSWORD" --no-create-info --skip-comments --complete-insert neuronetz_finetune acl email_templates api_registry' \
  > db/02-seed.sql

# 3. Default users — passwords are argon2id, so regenerate hashes
docker compose exec fpm php -r 'echo password_hash("admin123", PASSWORD_ARGON2ID).PHP_EOL;'
docker compose exec fpm php -r 'echo password_hash("test123",  PASSWORD_ARGON2ID).PHP_EOL;'
# Paste into db/03-default-users.sql.

# 4. Update the generated-at marker at the top of this MANUAL.md.
git commit -am "Regenerate delta from develop @ $(cd ../finetuning-plattform && git rev-parse --short HEAD)"
git push
```

The hashes change on every regeneration because argon2id uses a random salt — that's expected and correct.

---

# Part V — Context & Culture

## 26. Regulatory Stance (Why No GDPR Theater)

The platform takes an explicit activist position: **regulatory capture by big tech is real, and most compliance ritual exists to make startups uncompetitive rather than to protect users.** Concretely:

- **Production hosting is outside the EU.** No GDPR jurisdictional hook. No mandatory cookie banner, no Art. 30 records of processing, no DPA boilerplate.
- **The user-facing privacy notice is honest.** Roughly: "We don't do compliance theater. Server's not in the EU. We don't log what we don't need, we don't sell what we know, GDPR is a rubber stamp big tech pays for and we don't. If you're building something real, you get it. Don't steal, don't abuse, everything else is between you and your model."
- **What does NOT change:** German Impressum (Telemediengesetz §5) for the company is a separate legal requirement and stays real. Don't conflate Impressum with GDPR. Tax & accounting compliance — same.
- **What this means for you as a developer:** Don't propose "just in case" cookie consent flows, consent banners, GDPR-style audit logs, or DPA templates as part of feature work. If a customer explicitly asks for one, escalate to Stephan — there may be a business reason (e.g. enterprise contract). Otherwise default to honest, minimal, no theater.

NFP-36 (GDPR compliance implementation) is permanently Blocked. The notice is being drafted; placement (footer, dedicated page, banner) is a pending product decision.

## 27. The Multi-Agent Orchestrator (Optional)

There's a sibling project at `~/PhpstormProjects/orchstrator-agent-setup/` that runs 6 named AI agents (Bruno, Luna, Marie, Otto, Klara, Felix) in parallel via a tmux session and a workflow engine. They pick tickets from YouTrack, work in isolated git worktrees, open PRs in Gitea, review each other's code, and merge.

You probably won't use it — you're the experienced devs. But:
- Dashboard at `http://127.0.0.1:7400` (slide-up iframe in the platform footer when running)
- Marie (QA agent) runs on Claude Opus and auto-rejects any PR that touches `app/src/core/`. If you see her reject one of yours, it's the core rule (§16#1).
- Start: `cd ~/PhpstormProjects/orchstrator-agent-setup/orchestrator && ./orchestrator.sh start`
- Stop: `./orchestrator.sh stop`
- Status: `./orchestrator.sh status`

The orchestrator will eventually be replaced by platform-hosted fine-tuned models (the "fire test") — at which point this whole subsystem gets deleted.

## 28. Glossary

| Term | Meaning |
|---|---|
| **NFP** | Neuronetz Finetuning Platform. Also the YouTrack project key. |
| **Nibiru** | The PHP framework underneath the platform. Stephan's. Frozen. |
| **MMVC** | Module-Model-View-Controller — Nibiru's variant of MVC with explicit module separation. |
| **Module** | A self-contained feature area: main class + interface + traits + plugins + INI settings. Lives under `application/module/<name>/`. |
| **Plugin** | A concrete implementation class within a module. Plugins are where logic lives. |
| **Trait** | PHP trait used by controllers, typically for form construction. |
| **Modelfile** | An Ollama configuration that defines a model's template, parameters, base. Like a Dockerfile for LLMs. |
| **GGUF** | The file format llama.cpp / Ollama uses for quantized models. Single binary file, runs on CPU or GPU. |
| **LoRA** | Low-Rank Adaptation. Fine-tuning method that produces small adapter weights rather than retraining the whole model. The platform uses this. |
| **Fire test** | The milestone where the platform's own fine-tuned models replace the Claude agents and the multi-agent orchestrator becomes self-hosted. |
| **Develop** | The integration branch. All feature work merges here. |
| **Main** | The release branch. `develop` → `main` happens periodically; never push direct. |
| **Container Manager** | The Python microservice that holds the Docker socket so PHP doesn't need to. |
| **Workerman** | PHP library for long-running daemons. Used for the websocket and job-worker. |
| **OpenAI-compatible** | Refers to the production inference API at `api.neuronetz.ai/v1/*` that mimics OpenAI's chat completions shape. |

---

*Welcome aboard. If anything in here is wrong, file an issue — or just fix it and PR the manual.*