Strip api.neuronetz.ai from documentation; chat config stays in env
The Ollama URL was leaking via:
- prose in /en/, /de/, /ja/, /es/, /fr/ docs (oracle, deployment,
local-testing, ai/module/{overview,embed,training})
- code blocks teaching users to curl the host directly
- .env.example, Dockerfile, docker-compose.yml defaults
- providers.mjs, translate-docs.mjs, build-oracle-index.mjs defaults
- LandingScripts.astro comment
- lora-runbook.md prose + SSH host
- the GET handler at /api/oracle which echoed `ollamaUrl` back to public callers
- the "Oracle is silent" fallback message at /api/oracle POST
Replacements:
- prose: "neuronetz.ai" → "your Ollama instance"
- example URLs in code blocks: https://api.neuronetz.ai → https://your-ollama-host.example
- code-level defaults: → http://localhost:11434 (Ollama's standard local port)
- GET /api/oracle: dropped the `ollamaUrl` field; provider + model still exposed
- runbook SSH host: neuronetz@cloud.neuronetz.ai → <gpu-user>@<gpu-host>
Production chat is unaffected: docs/.env (gitignored) on the production
host still pins OLLAMA_BASE_URL=https://api.neuronetz.ai. The only
change in the running container is that the GET handler no longer
echoes the URL.
analytics.neuronetz.ai (Umami tracking) is intentionally left intact —
it's a public, brand-owned subdomain meant to be visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
324
application/module/ai/training/lora-runbook.md
Normal file
324
application/module/ai/training/lora-runbook.md
Normal file
@@ -0,0 +1,324 @@
|
||||
# LoRA training runbook — Nibiru
|
||||
|
||||
Step-by-step for fine-tuning a base model on the Nibiru training corpus, with
|
||||
live metrics in the Claude Sessions GUI Training tab.
|
||||
|
||||
The corpus we ship at https://nibiru-framework.com/corpus/ is built by
|
||||
`docs/scripts/build-corpus.mjs` (see `docs/src/content/docs/en/ai/corpus.md`).
|
||||
This runbook trains a LoRA on top of `qwen2.5-coder:14b` and registers the
|
||||
result on the same Ollama at `your-ollama-host.example`.
|
||||
|
||||
---
|
||||
|
||||
## 1. Pre-flight on the GPU box
|
||||
|
||||
```sh
|
||||
ssh <gpu-user>@<gpu-host>
|
||||
nvidia-smi # GPU visible, CUDA driver matches torch
|
||||
docker --version # 24+ recommended
|
||||
mkdir -p ~/training/nibiru && cd ~/training/nibiru
|
||||
```
|
||||
|
||||
Pick the hardware-appropriate base model:
|
||||
|
||||
| GPU | Base model | Quant | Effective memory |
|
||||
|--------------------|-------------------------|------------|------------------|
|
||||
| 24 GB (4090, 3090) | `Qwen2.5-Coder-7B` | 4-bit | ~14 GB |
|
||||
| 48 GB (A6000) | `Qwen2.5-Coder-14B` | 4-bit | ~22 GB |
|
||||
| 80 GB (A100, H100) | `Qwen2.5-Coder-32B` | 4-bit | ~38 GB |
|
||||
|
||||
Pull the base model checkpoint to a host directory (one-time):
|
||||
|
||||
```sh
|
||||
mkdir -p ~/training/models
|
||||
huggingface-cli download Qwen/Qwen2.5-Coder-14B-Instruct \
|
||||
--local-dir ~/training/models/qwen25-coder-14b
|
||||
```
|
||||
|
||||
(Login first with `huggingface-cli login` if the model requires it.)
|
||||
|
||||
---
|
||||
|
||||
## 2. Pull the corpus
|
||||
|
||||
The docs site exposes the corpus at `/corpus/*.jsonl`. We pull the chat
|
||||
format because the trainer config below uses HF's `chat_template`.
|
||||
|
||||
```sh
|
||||
cd ~/training/nibiru
|
||||
mkdir -p data
|
||||
curl -fLo data/chat.jsonl https://nibiru-framework.com/corpus/chat-en.jsonl
|
||||
curl -fLo data/manifest.json https://nibiru-framework.com/corpus/manifest.json
|
||||
|
||||
# Verify the file integrity
|
||||
SHA=$(jq -r '.files[] | select(.filename=="chat-en.jsonl") | .sha256' data/manifest.json)
|
||||
test "$(sha256sum data/chat.jsonl | cut -d' ' -f1)" = "$SHA" && echo "ok" || echo "MISMATCH"
|
||||
```
|
||||
|
||||
For a multilingual LoRA pull `chat-all.jsonl` instead. Keep one language per
|
||||
training run unless you have a reason to mix — it makes the loss curve
|
||||
cleaner and the resulting model less prone to language-leakage.
|
||||
|
||||
---
|
||||
|
||||
## 3. Training container — `Dockerfile` + `compose.yml`
|
||||
|
||||
The Sessions GUI dashboard polls the box via SSH and runs `docker logs` on
|
||||
the container, parsing the tqdm progress + `'loss'` lines. So the container
|
||||
needs to run inside Docker (not bare-metal), under a stable name, with logs
|
||||
streamed to stdout.
|
||||
|
||||
```sh
|
||||
cd ~/training/nibiru
|
||||
```
|
||||
|
||||
**`Dockerfile`** (copy verbatim):
|
||||
|
||||
```dockerfile
|
||||
# syntax=docker/dockerfile:1.6
|
||||
FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04
|
||||
|
||||
ENV DEBIAN_FRONTEND=noninteractive PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
python3 python3-pip git ca-certificates && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN pip3 install --upgrade pip && pip3 install \
|
||||
"torch==2.4.*" --index-url https://download.pytorch.org/whl/cu124
|
||||
RUN pip3 install \
|
||||
"unsloth[cu124-torch240]==2024.10.7" \
|
||||
"transformers==4.46.*" \
|
||||
"datasets==3.0.*" \
|
||||
"trl==0.11.*" \
|
||||
"peft==0.13.*" \
|
||||
"accelerate==1.0.*" \
|
||||
"bitsandbytes==0.44.*"
|
||||
|
||||
WORKDIR /workspace
|
||||
COPY train.py ./train.py
|
||||
CMD ["python3", "-u", "train.py"]
|
||||
```
|
||||
|
||||
**`compose.yml`**:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
trainer:
|
||||
build: .
|
||||
container_name: nibiru-trainer
|
||||
restart: "no"
|
||||
runtime: nvidia
|
||||
environment:
|
||||
NVIDIA_VISIBLE_DEVICES: all
|
||||
NVIDIA_DRIVER_CAPABILITIES: compute,utility
|
||||
volumes:
|
||||
- ./data:/workspace/data:ro
|
||||
- ./out:/workspace/out
|
||||
- ~/training/models:/workspace/models:ro
|
||||
shm_size: '8gb'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Training script — `train.py`
|
||||
|
||||
```python
|
||||
"""Nibiru LoRA — single-GPU unsloth, chat-template, HF Trainer.
|
||||
|
||||
Outputs:
|
||||
out/nibiru-lora/
|
||||
adapter_hf/ # the LoRA adapter (PEFT format)
|
||||
trainer_state.json # epochs, losses, eval — what the dashboard reads
|
||||
checkpoint-*/ # periodic checkpoints
|
||||
logs/ # tensorboard event files
|
||||
"""
|
||||
import os, json
|
||||
from datasets import load_dataset
|
||||
from trl import SFTConfig, SFTTrainer
|
||||
from unsloth import FastLanguageModel
|
||||
from unsloth.chat_templates import get_chat_template
|
||||
|
||||
BASE_MODEL = os.environ.get("BASE_MODEL", "/workspace/models/qwen25-coder-14b")
|
||||
TRAIN_FILE = os.environ.get("TRAIN_FILE", "/workspace/data/chat.jsonl")
|
||||
OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "/workspace/out/nibiru-lora")
|
||||
MAX_SEQ_LEN = int(os.environ.get("MAX_SEQ_LEN", "4096"))
|
||||
|
||||
# 1. Load model + 4-bit quant + LoRA adapter on top
|
||||
model, tokenizer = FastLanguageModel.from_pretrained(
|
||||
model_name = BASE_MODEL,
|
||||
max_seq_length = MAX_SEQ_LEN,
|
||||
load_in_4bit = True,
|
||||
)
|
||||
model = FastLanguageModel.get_peft_model(
|
||||
model,
|
||||
r = 16,
|
||||
target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
|
||||
lora_alpha = 32,
|
||||
lora_dropout = 0.05,
|
||||
bias = "none",
|
||||
use_gradient_checkpointing = "unsloth",
|
||||
)
|
||||
tokenizer = get_chat_template(tokenizer, chat_template="qwen-2.5")
|
||||
|
||||
# 2. Load corpus, format as chat with the tokenizer's template
|
||||
ds = load_dataset("json", data_files=TRAIN_FILE, split="train")
|
||||
def fmt(rec):
|
||||
msgs = rec["messages"]
|
||||
return {"text": tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)}
|
||||
ds = ds.map(fmt, num_proc=4, remove_columns=ds.column_names)
|
||||
|
||||
# 3. Train
|
||||
trainer = SFTTrainer(
|
||||
model = model,
|
||||
tokenizer = tokenizer,
|
||||
train_dataset = ds,
|
||||
args = SFTConfig(
|
||||
output_dir = OUTPUT_DIR,
|
||||
max_seq_length = MAX_SEQ_LEN,
|
||||
per_device_train_batch_size = 2,
|
||||
gradient_accumulation_steps = 4,
|
||||
warmup_ratio = 0.03,
|
||||
num_train_epochs = 3,
|
||||
learning_rate = 2e-4,
|
||||
bf16 = True,
|
||||
logging_steps = 5,
|
||||
save_steps = 200,
|
||||
save_total_limit = 3,
|
||||
report_to = "none",
|
||||
dataset_text_field = "text",
|
||||
packing = True,
|
||||
# === trainer_state === ← marker the Sessions GUI dashboard greps for
|
||||
),
|
||||
)
|
||||
trainer.train()
|
||||
|
||||
# 4. Save adapter in HF (PEFT) format — what the merge step expects
|
||||
model.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf"))
|
||||
tokenizer.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf"))
|
||||
print("\\n=== training complete ===")
|
||||
```
|
||||
|
||||
The literal comment `=== trainer_state ===` near the SFTConfig is the marker
|
||||
the dashboard uses to fall back to reading `trainer_state.json` after the
|
||||
container exits — see `dashboard/server.py:_extract_trainer_state`. Don't
|
||||
remove it.
|
||||
|
||||
---
|
||||
|
||||
## 5. Run it
|
||||
|
||||
```sh
|
||||
cd ~/training/nibiru
|
||||
docker compose build
|
||||
docker compose up -d
|
||||
docker logs -f nibiru-trainer
|
||||
```
|
||||
|
||||
Expected log shape (this is what the Sessions GUI parses):
|
||||
|
||||
```
|
||||
{'loss': '1.2345', 'grad_norm': '...', 'learning_rate': ..., 'epoch': '0.05'}
|
||||
12%|## | 60/500 [01:23<10:14, 1.40s/it]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Open the Sessions GUI Training tab
|
||||
|
||||
In the GUI, click **Training**. Fill the strip at the top of the tab:
|
||||
|
||||
| Field | Value |
|
||||
|-----------------|----------------------------------------------|
|
||||
| host | `<gpu-user>@<gpu-host>` |
|
||||
| container | `nibiru-trainer` |
|
||||
| project_dir | `/home/<gpu-user>/training/nibiru` |
|
||||
| output_subdir | `out/nibiru-lora` |
|
||||
| adapter_filename| `adapter_hf/adapter_model.safetensors` |
|
||||
|
||||
The dashboard remembers the last config in `~/.config/training-monitor/last.json`,
|
||||
so after the first save you just open the tab and the live metrics appear.
|
||||
|
||||
While the container is running you'll see:
|
||||
- progress bar (% / step / total / ETA — parsed from tqdm)
|
||||
- last 30 loss values + epoch
|
||||
- the live `docker ps` line for the container
|
||||
|
||||
After training ends and the container exits, the dashboard falls back to
|
||||
`trainer_state.json` in `out/nibiru-lora/` so the final losses + step count
|
||||
stay visible.
|
||||
|
||||
---
|
||||
|
||||
## 7. Merge LoRA into the base + register on Ollama
|
||||
|
||||
```sh
|
||||
# On the GPU box, still inside ~/training/nibiru/
|
||||
docker run --rm --runtime=nvidia \
|
||||
-v $PWD/out/nibiru-lora:/lora:ro \
|
||||
-v ~/training/models/qwen25-coder-14b:/base:ro \
|
||||
-v $PWD/out/merged:/out \
|
||||
nibiru-trainer \
|
||||
python3 -c "
|
||||
from unsloth import FastLanguageModel
|
||||
m, tok = FastLanguageModel.from_pretrained('/base', load_in_4bit=False)
|
||||
m.load_adapter('/lora/adapter_hf', adapter_name='nibiru')
|
||||
m.merge_and_unload()
|
||||
m.save_pretrained_gguf('/out', tok, quantization_method='q4_k_m')
|
||||
"
|
||||
```
|
||||
|
||||
This writes a single `*-q4_k_m.gguf` in `out/merged/`. Push it to your Ollama:
|
||||
|
||||
```sh
|
||||
# === build a Modelfile referencing the merged GGUF ===
|
||||
GGUF=$(ls out/merged/*.gguf | head -1)
|
||||
cat > out/merged/Modelfile <<EOF
|
||||
FROM $GGUF
|
||||
PARAMETER temperature 0.4
|
||||
PARAMETER top_p 0.9
|
||||
SYSTEM "You are nibiru-coder, a senior PHP architect specialised in the Nibiru framework. Always cite file:line where relevant. Never use 'presumably', 'likely', 'appears to'."
|
||||
EOF
|
||||
|
||||
# === register on Ollama at your-ollama-host.example ===
|
||||
OLLAMA_BASE_URL=https://your-ollama-host.example
|
||||
TAG="nibiru-coder:lora-1.0"
|
||||
jq -n --arg name "$TAG" --rawfile mf out/merged/Modelfile \
|
||||
'{name: $name, modelfile: $mf, stream: false}' \
|
||||
| curl -sS -X POST "$OLLAMA_BASE_URL/api/create" \
|
||||
-H 'Content-Type: application/json' --data-binary @-
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```sh
|
||||
curl -sS "$OLLAMA_BASE_URL/api/tags" | jq '.models[] | select(.name|startswith("nibiru-coder"))'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Switch the live docs Oracle to the new model
|
||||
|
||||
On the production host (`bittomine@…`), `docs/.env` controls which model
|
||||
the Oracle uses:
|
||||
|
||||
```sh
|
||||
sed -i 's|^OLLAMA_CHAT_MODEL=.*|OLLAMA_CHAT_MODEL=nibiru-coder:lora-1.0|' docs/.env
|
||||
docker compose up -d # recreates the docs container with the new env
|
||||
```
|
||||
|
||||
Hard-refresh https://nibiru-framework.com/en/ and ask the Oracle a Nibiru
|
||||
question — it should answer with file:line citations and stop guessing.
|
||||
|
||||
---
|
||||
|
||||
## 9. Iterate
|
||||
|
||||
If the model regresses or hallucinates on a topic:
|
||||
|
||||
1. Find the missing pattern in the framework reference.
|
||||
2. Add it to `docs/scripts/extraction/lora-augmentation.jsonl` (one or two
|
||||
high-quality Q/A pairs).
|
||||
3. Rebuild the corpus: `cd docs && node scripts/build-corpus.mjs`.
|
||||
4. Bump the tag (`nibiru-coder:lora-1.1`) and re-run from step 5.
|
||||
|
||||
Keep the previous tag live until the new one is verified — Ollama keeps
|
||||
both, so flipping `OLLAMA_CHAT_MODEL` back is one env-var change away.
|
||||
Reference in New Issue
Block a user