Strip api.neuronetz.ai from documentation; chat config stays in env

The Ollama URL was leaking via: - prose in /en/, /de/, /ja/, /es/, /fr/ docs (oracle, deployment, local-testing, ai/module/{overview,embed,training}) - code blocks teaching users to curl the host directly - .env.example, Dockerfile, docker-compose.yml defaults - providers.mjs, translate-docs.mjs, build-oracle-index.mjs defaults - LandingScripts.astro comment - lora-runbook.md prose + SSH host - the GET handler at /api/oracle which echoed `ollamaUrl` back to public callers - the "Oracle is silent" fallback message at /api/oracle POST Replacements: - prose: "neuronetz.ai" → "your Ollama instance" - example URLs in code blocks: https://api.neuronetz.ai → https://your-ollama-host.example - code-level defaults: → http://localhost:11434 (Ollama's standard local port) - GET /api/oracle: dropped the `ollamaUrl` field; provider + model still exposed - runbook SSH host: neuronetz@cloud.neuronetz.ai → <gpu-user>@<gpu-host> Production chat is unaffected: docs/.env (gitignored) on the production host still pins OLLAMA_BASE_URL=https://api.neuronetz.ai. The only change in the running container is that the GET handler no longer echoes the URL. analytics.neuronetz.ai (Umami tracking) is intentionally left intact — it's a public, brand-owned subdomain meant to be visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:14:17 +02:00
parent 9b7fd15ca1
commit f4ccc45a3b
44 changed files with 1386 additions and 292 deletions
--- a/application/module/ai/training/lora-runbook.md
+++ b/application/module/ai/training/lora-runbook.md
@@ -0,0 +1,324 @@
+# LoRA training runbook — Nibiru
+
+Step-by-step for fine-tuning a base model on the Nibiru training corpus, with
+live metrics in the Claude Sessions GUI Training tab.
+
+The corpus we ship at https://nibiru-framework.com/corpus/ is built by
+`docs/scripts/build-corpus.mjs` (see `docs/src/content/docs/en/ai/corpus.md`).
+This runbook trains a LoRA on top of `qwen2.5-coder:14b` and registers the
+result on the same Ollama at `your-ollama-host.example`.
+
+---
+
+## 1. Pre-flight on the GPU box
+
+```sh
+ssh <gpu-user>@<gpu-host>
+nvidia-smi                  # GPU visible, CUDA driver matches torch
+docker --version            # 24+ recommended
+mkdir -p ~/training/nibiru && cd ~/training/nibiru
+```
+
+Pick the hardware-appropriate base model:
+
+| GPU                | Base model              | Quant      | Effective memory |
+|--------------------|-------------------------|------------|------------------|
+| 24 GB (4090, 3090) | `Qwen2.5-Coder-7B`      | 4-bit      | ~14 GB           |
+| 48 GB (A6000)      | `Qwen2.5-Coder-14B`     | 4-bit      | ~22 GB           |
+| 80 GB (A100, H100) | `Qwen2.5-Coder-32B`     | 4-bit      | ~38 GB           |
+
+Pull the base model checkpoint to a host directory (one-time):
+
+```sh
+mkdir -p ~/training/models
+huggingface-cli download Qwen/Qwen2.5-Coder-14B-Instruct \
+  --local-dir ~/training/models/qwen25-coder-14b
+```
+
+(Login first with `huggingface-cli login` if the model requires it.)
+
+---
+
+## 2. Pull the corpus
+
+The docs site exposes the corpus at `/corpus/*.jsonl`. We pull the chat
+format because the trainer config below uses HF's `chat_template`.
+
+```sh
+cd ~/training/nibiru
+mkdir -p data
+curl -fLo data/chat.jsonl     https://nibiru-framework.com/corpus/chat-en.jsonl
+curl -fLo data/manifest.json  https://nibiru-framework.com/corpus/manifest.json
+
+# Verify the file integrity
+SHA=$(jq -r '.files[] | select(.filename=="chat-en.jsonl") | .sha256' data/manifest.json)
+test "$(sha256sum data/chat.jsonl | cut -d' ' -f1)" = "$SHA" && echo "ok" || echo "MISMATCH"
+```
+
+For a multilingual LoRA pull `chat-all.jsonl` instead. Keep one language per
+training run unless you have a reason to mix — it makes the loss curve
+cleaner and the resulting model less prone to language-leakage.
+
+---
+
+## 3. Training container — `Dockerfile` + `compose.yml`
+
+The Sessions GUI dashboard polls the box via SSH and runs `docker logs` on
+the container, parsing the tqdm progress + `'loss'` lines. So the container
+needs to run inside Docker (not bare-metal), under a stable name, with logs
+streamed to stdout.
+
+```sh
+cd ~/training/nibiru
+```
+
+**`Dockerfile`** (copy verbatim):
+
+```dockerfile
+# syntax=docker/dockerfile:1.6
+FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
+
+ENV DEBIAN_FRONTEND=noninteractive PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3 python3-pip git ca-certificates && rm -rf /var/lib/apt/lists/*
+
+RUN pip3 install --upgrade pip && pip3 install \
+    "torch==2.4.*" --index-url https://download.pytorch.org/whl/cu124
+RUN pip3 install \
+    "unsloth[cu124-torch240]==2024.10.7" \
+    "transformers==4.46.*" \
+    "datasets==3.0.*" \
+    "trl==0.11.*" \
+    "peft==0.13.*" \
+    "accelerate==1.0.*" \
+    "bitsandbytes==0.44.*"
+
+WORKDIR /workspace
+COPY train.py ./train.py
+CMD ["python3", "-u", "train.py"]
+```
+
+**`compose.yml`**:
+
+```yaml
+services:
+  trainer:
+    build: .
+    container_name: nibiru-trainer
+    restart: "no"
+    runtime: nvidia
+    environment:
+      NVIDIA_VISIBLE_DEVICES: all
+      NVIDIA_DRIVER_CAPABILITIES: compute,utility
+    volumes:
+      - ./data:/workspace/data:ro
+      - ./out:/workspace/out
+      - ~/training/models:/workspace/models:ro
+    shm_size: '8gb'
+```
+
+---
+
+## 4. Training script — `train.py`
+
+```python
+"""Nibiru LoRA — single-GPU unsloth, chat-template, HF Trainer.
+
+Outputs:
+  out/nibiru-lora/
+    adapter_hf/                 # the LoRA adapter (PEFT format)
+    trainer_state.json          # epochs, losses, eval — what the dashboard reads
+    checkpoint-*/               # periodic checkpoints
+    logs/                       # tensorboard event files
+"""
+import os, json
+from datasets import load_dataset
+from trl import SFTConfig, SFTTrainer
+from unsloth import FastLanguageModel
+from unsloth.chat_templates import get_chat_template
+
+BASE_MODEL  = os.environ.get("BASE_MODEL",  "/workspace/models/qwen25-coder-14b")
+TRAIN_FILE  = os.environ.get("TRAIN_FILE",  "/workspace/data/chat.jsonl")
+OUTPUT_DIR  = os.environ.get("OUTPUT_DIR",  "/workspace/out/nibiru-lora")
+MAX_SEQ_LEN = int(os.environ.get("MAX_SEQ_LEN", "4096"))
+
+# 1. Load model + 4-bit quant + LoRA adapter on top
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name        = BASE_MODEL,
+    max_seq_length    = MAX_SEQ_LEN,
+    load_in_4bit      = True,
+)
+model = FastLanguageModel.get_peft_model(
+    model,
+    r                          = 16,
+    target_modules             = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
+    lora_alpha                 = 32,
+    lora_dropout               = 0.05,
+    bias                       = "none",
+    use_gradient_checkpointing = "unsloth",
+)
+tokenizer = get_chat_template(tokenizer, chat_template="qwen-2.5")
+
+# 2. Load corpus, format as chat with the tokenizer's template
+ds = load_dataset("json", data_files=TRAIN_FILE, split="train")
+def fmt(rec):
+    msgs = rec["messages"]
+    return {"text": tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)}
+ds = ds.map(fmt, num_proc=4, remove_columns=ds.column_names)
+
+# 3. Train
+trainer = SFTTrainer(
+    model         = model,
+    tokenizer     = tokenizer,
+    train_dataset = ds,
+    args = SFTConfig(
+        output_dir                  = OUTPUT_DIR,
+        max_seq_length              = MAX_SEQ_LEN,
+        per_device_train_batch_size = 2,
+        gradient_accumulation_steps = 4,
+        warmup_ratio                = 0.03,
+        num_train_epochs            = 3,
+        learning_rate               = 2e-4,
+        bf16                        = True,
+        logging_steps               = 5,
+        save_steps                  = 200,
+        save_total_limit            = 3,
+        report_to                   = "none",
+        dataset_text_field          = "text",
+        packing                     = True,
+        # === trainer_state ===  ← marker the Sessions GUI dashboard greps for
+    ),
+)
+trainer.train()
+
+# 4. Save adapter in HF (PEFT) format — what the merge step expects
+model.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf"))
+tokenizer.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf"))
+print("\\n=== training complete ===")
+```
+
+The literal comment `=== trainer_state ===` near the SFTConfig is the marker
+the dashboard uses to fall back to reading `trainer_state.json` after the
+container exits — see `dashboard/server.py:_extract_trainer_state`. Don't
+remove it.
+
+---
+
+## 5. Run it
+
+```sh
+cd ~/training/nibiru
+docker compose build
+docker compose up -d
+docker logs -f nibiru-trainer
+```
+
+Expected log shape (this is what the Sessions GUI parses):
+
+```
+{'loss': '1.2345', 'grad_norm': '...', 'learning_rate': ..., 'epoch': '0.05'}
+ 12%|##         | 60/500 [01:23<10:14,  1.40s/it]
+```
+
+---
+
+## 6. Open the Sessions GUI Training tab
+
+In the GUI, click **Training**. Fill the strip at the top of the tab:
+
+| Field           | Value                                        |
+|-----------------|----------------------------------------------|
+| host            | `<gpu-user>@<gpu-host>`               |
+| container       | `nibiru-trainer`                             |
+| project_dir     | `/home/<gpu-user>/training/nibiru`            |
+| output_subdir   | `out/nibiru-lora`                            |
+| adapter_filename| `adapter_hf/adapter_model.safetensors`       |
+
+The dashboard remembers the last config in `~/.config/training-monitor/last.json`,
+so after the first save you just open the tab and the live metrics appear.
+
+While the container is running you'll see:
+- progress bar (% / step / total / ETA — parsed from tqdm)
+- last 30 loss values + epoch
+- the live `docker ps` line for the container
+
+After training ends and the container exits, the dashboard falls back to
+`trainer_state.json` in `out/nibiru-lora/` so the final losses + step count
+stay visible.
+
+---
+
+## 7. Merge LoRA into the base + register on Ollama
+
+```sh
+# On the GPU box, still inside ~/training/nibiru/
+docker run --rm --runtime=nvidia \
+  -v $PWD/out/nibiru-lora:/lora:ro \
+  -v ~/training/models/qwen25-coder-14b:/base:ro \
+  -v $PWD/out/merged:/out \
+  nibiru-trainer \
+  python3 -c "
+from unsloth import FastLanguageModel
+m, tok = FastLanguageModel.from_pretrained('/base', load_in_4bit=False)
+m.load_adapter('/lora/adapter_hf', adapter_name='nibiru')
+m.merge_and_unload()
+m.save_pretrained_gguf('/out', tok, quantization_method='q4_k_m')
+"
+```
+
+This writes a single `*-q4_k_m.gguf` in `out/merged/`. Push it to your Ollama:
+
+```sh
+# === build a Modelfile referencing the merged GGUF ===
+GGUF=$(ls out/merged/*.gguf | head -1)
+cat > out/merged/Modelfile <<EOF
+FROM $GGUF
+PARAMETER temperature 0.4
+PARAMETER top_p 0.9
+SYSTEM "You are nibiru-coder, a senior PHP architect specialised in the Nibiru framework. Always cite file:line where relevant. Never use 'presumably', 'likely', 'appears to'."
+EOF
+
+# === register on Ollama at your-ollama-host.example ===
+OLLAMA_BASE_URL=https://your-ollama-host.example
+TAG="nibiru-coder:lora-1.0"
+jq -n --arg name "$TAG" --rawfile mf out/merged/Modelfile \
+   '{name: $name, modelfile: $mf, stream: false}' \
+  | curl -sS -X POST "$OLLAMA_BASE_URL/api/create" \
+      -H 'Content-Type: application/json' --data-binary @-
+```
+
+Verify:
+
+```sh
+curl -sS "$OLLAMA_BASE_URL/api/tags" | jq '.models[] | select(.name|startswith("nibiru-coder"))'
+```
+
+---
+
+## 8. Switch the live docs Oracle to the new model
+
+On the production host (`bittomine@…`), `docs/.env` controls which model
+the Oracle uses:
+
+```sh
+sed -i 's|^OLLAMA_CHAT_MODEL=.*|OLLAMA_CHAT_MODEL=nibiru-coder:lora-1.0|' docs/.env
+docker compose up -d              # recreates the docs container with the new env
+```
+
+Hard-refresh https://nibiru-framework.com/en/ and ask the Oracle a Nibiru
+question — it should answer with file:line citations and stop guessing.
+
+---
+
+## 9. Iterate
+
+If the model regresses or hallucinates on a topic:
+
+1. Find the missing pattern in the framework reference.
+2. Add it to `docs/scripts/extraction/lora-augmentation.jsonl` (one or two
+   high-quality Q/A pairs).
+3. Rebuild the corpus: `cd docs && node scripts/build-corpus.mjs`.
+4. Bump the tag (`nibiru-coder:lora-1.1`) and re-run from step 5.
+
+Keep the previous tag live until the new one is verified — Ollama keeps
+both, so flipping `OLLAMA_CHAT_MODEL` back is one env-var change away.