Files

stephan f4ccc45a3b Strip api.neuronetz.ai from documentation; chat config stays in env

The Ollama URL was leaking via:
  - prose in /en/, /de/, /ja/, /es/, /fr/ docs (oracle, deployment,
    local-testing, ai/module/{overview,embed,training})
  - code blocks teaching users to curl the host directly
  - .env.example, Dockerfile, docker-compose.yml defaults
  - providers.mjs, translate-docs.mjs, build-oracle-index.mjs defaults
  - LandingScripts.astro comment
  - lora-runbook.md prose + SSH host
  - the GET handler at /api/oracle which echoed `ollamaUrl` back to public callers
  - the "Oracle is silent" fallback message at /api/oracle POST

Replacements:
  - prose: "neuronetz.ai" → "your Ollama instance"
  - example URLs in code blocks: https://api.neuronetz.ai → https://your-ollama-host.example
  - code-level defaults: → http://localhost:11434 (Ollama's standard local port)
  - GET /api/oracle: dropped the `ollamaUrl` field; provider + model still exposed
  - runbook SSH host: neuronetz@cloud.neuronetz.ai → <gpu-user>@<gpu-host>

Production chat is unaffected: docs/.env (gitignored) on the production
host still pins OLLAMA_BASE_URL=https://api.neuronetz.ai. The only
change in the running container is that the GET handler no longer
echoes the URL.

analytics.neuronetz.ai (Umami tracking) is intentionally left intact —
it's a public, brand-owned subdomain meant to be visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 17:14:17 +02:00

10 KiB

Raw Blame History

LoRA training runbook — Nibiru

Step-by-step for fine-tuning a base model on the Nibiru training corpus, with live metrics in the Claude Sessions GUI Training tab.

The corpus we ship at https://nibiru-framework.com/corpus/ is built by docs/scripts/build-corpus.mjs (see docs/src/content/docs/en/ai/corpus.md). This runbook trains a LoRA on top of qwen2.5-coder:14b and registers the result on the same Ollama at your-ollama-host.example.

1. Pre-flight on the GPU box

ssh <gpu-user>@<gpu-host>
nvidia-smi                  # GPU visible, CUDA driver matches torch
docker --version            # 24+ recommended
mkdir -p ~/training/nibiru && cd ~/training/nibiru

Pick the hardware-appropriate base model:

GPU	Base model	Quant	Effective memory
24 GB (4090, 3090)	`Qwen2.5-Coder-7B`	4-bit	~14 GB
48 GB (A6000)	`Qwen2.5-Coder-14B`	4-bit	~22 GB
80 GB (A100, H100)	`Qwen2.5-Coder-32B`	4-bit	~38 GB

Pull the base model checkpoint to a host directory (one-time):

mkdir -p ~/training/models
huggingface-cli download Qwen/Qwen2.5-Coder-14B-Instruct \
  --local-dir ~/training/models/qwen25-coder-14b

(Login first with huggingface-cli login if the model requires it.)

2. Pull the corpus

The docs site exposes the corpus at /corpus/*.jsonl. We pull the chat format because the trainer config below uses HF's chat_template.

cd ~/training/nibiru
mkdir -p data
curl -fLo data/chat.jsonl     https://nibiru-framework.com/corpus/chat-en.jsonl
curl -fLo data/manifest.json  https://nibiru-framework.com/corpus/manifest.json

# Verify the file integrity
SHA=$(jq -r '.files[] | select(.filename=="chat-en.jsonl") | .sha256' data/manifest.json)
test "$(sha256sum data/chat.jsonl | cut -d' ' -f1)" = "$SHA" && echo "ok" || echo "MISMATCH"

For a multilingual LoRA pull chat-all.jsonl instead. Keep one language per training run unless you have a reason to mix — it makes the loss curve cleaner and the resulting model less prone to language-leakage.

3. Training container — `Dockerfile` + `compose.yml`

The Sessions GUI dashboard polls the box via SSH and runs docker logs on the container, parsing the tqdm progress + 'loss' lines. So the container needs to run inside Docker (not bare-metal), under a stable name, with logs streamed to stdout.

cd ~/training/nibiru

Dockerfile (copy verbatim):

# syntax=docker/dockerfile:1.6
FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1
RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 python3-pip git ca-certificates && rm -rf /var/lib/apt/lists/*

RUN pip3 install --upgrade pip && pip3 install \
    "torch==2.4.*" --index-url https://download.pytorch.org/whl/cu124
RUN pip3 install \
    "unsloth[cu124-torch240]==2024.10.7" \
    "transformers==4.46.*" \
    "datasets==3.0.*" \
    "trl==0.11.*" \
    "peft==0.13.*" \
    "accelerate==1.0.*" \
    "bitsandbytes==0.44.*"

WORKDIR /workspace
COPY train.py ./train.py
CMD ["python3", "-u", "train.py"]

compose.yml:

services:
  trainer:
    build: .
    container_name: nibiru-trainer
    restart: "no"
    runtime: nvidia
    environment:
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,utility
    volumes:
      - ./data:/workspace/data:ro
      - ./out:/workspace/out
      - ~/training/models:/workspace/models:ro
    shm_size: '8gb'

4. Training script — `train.py`

"""Nibiru LoRA — single-GPU unsloth, chat-template, HF Trainer.

Outputs:
  out/nibiru-lora/
    adapter_hf/                 # the LoRA adapter (PEFT format)
    trainer_state.json          # epochs, losses, eval — what the dashboard reads
    checkpoint-*/               # periodic checkpoints
    logs/                       # tensorboard event files
"""
import os, json
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template

BASE_MODEL  = os.environ.get("BASE_MODEL",  "/workspace/models/qwen25-coder-14b")
TRAIN_FILE  = os.environ.get("TRAIN_FILE",  "/workspace/data/chat.jsonl")
OUTPUT_DIR  = os.environ.get("OUTPUT_DIR",  "/workspace/out/nibiru-lora")
MAX_SEQ_LEN = int(os.environ.get("MAX_SEQ_LEN", "4096"))

# 1. Load model + 4-bit quant + LoRA adapter on top
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name        = BASE_MODEL,
    max_seq_length    = MAX_SEQ_LEN,
    load_in_4bit      = True,
)
model = FastLanguageModel.get_peft_model(
    model,
    r                          = 16,
    target_modules             = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha                 = 32,
    lora_dropout               = 0.05,
    bias                       = "none",
    use_gradient_checkpointing = "unsloth",
)
tokenizer = get_chat_template(tokenizer, chat_template="qwen-2.5")

# 2. Load corpus, format as chat with the tokenizer's template
ds = load_dataset("json", data_files=TRAIN_FILE, split="train")
def fmt(rec):
    msgs = rec["messages"]
    return {"text": tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)}
ds = ds.map(fmt, num_proc=4, remove_columns=ds.column_names)

# 3. Train
trainer = SFTTrainer(
    model         = model,
    tokenizer     = tokenizer,
    train_dataset = ds,
    args = SFTConfig(
        output_dir                  = OUTPUT_DIR,
        max_seq_length              = MAX_SEQ_LEN,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio                = 0.03,
        num_train_epochs            = 3,
        learning_rate               = 2e-4,
        bf16                        = True,
        logging_steps               = 5,
        save_steps                  = 200,
        save_total_limit            = 3,
        report_to                   = "none",
        dataset_text_field          = "text",
        packing                     = True,
        # === trainer_state ===  ← marker the Sessions GUI dashboard greps for
    ),
)
trainer.train()

# 4. Save adapter in HF (PEFT) format — what the merge step expects
model.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf"))
tokenizer.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf"))
print("\\n=== training complete ===")

The literal comment === trainer_state === near the SFTConfig is the marker the dashboard uses to fall back to reading trainer_state.json after the container exits — see dashboard/server.py:_extract_trainer_state. Don't remove it.

5. Run it

cd ~/training/nibiru
docker compose build
docker compose up -d
docker logs -f nibiru-trainer

Expected log shape (this is what the Sessions GUI parses):

{'loss': '1.2345', 'grad_norm': '...', 'learning_rate': ..., 'epoch': '0.05'}
 12%|##         | 60/500 [01:23<10:14,  1.40s/it]

6. Open the Sessions GUI Training tab

In the GUI, click Training. Fill the strip at the top of the tab:

Field	Value
host	`<gpu-user>@<gpu-host>`
container	`nibiru-trainer`
project_dir	`/home/<gpu-user>/training/nibiru`
output_subdir	`out/nibiru-lora`
adapter_filename	`adapter_hf/adapter_model.safetensors`

The dashboard remembers the last config in ~/.config/training-monitor/last.json, so after the first save you just open the tab and the live metrics appear.

While the container is running you'll see:

progress bar (% / step / total / ETA — parsed from tqdm)
last 30 loss values + epoch
the live docker ps line for the container

After training ends and the container exits, the dashboard falls back to trainer_state.json in out/nibiru-lora/ so the final losses + step count stay visible.

7. Merge LoRA into the base + register on Ollama

# On the GPU box, still inside ~/training/nibiru/
docker run --rm --runtime=nvidia \
  -v $PWD/out/nibiru-lora:/lora:ro \
  -v ~/training/models/qwen25-coder-14b:/base:ro \
  -v $PWD/out/merged:/out \
  nibiru-trainer \
  python3 -c "
from unsloth import FastLanguageModel
m, tok = FastLanguageModel.from_pretrained('/base', load_in_4bit=False)
m.load_adapter('/lora/adapter_hf', adapter_name='nibiru')
m.merge_and_unload()
m.save_pretrained_gguf('/out', tok, quantization_method='q4_k_m')
"

This writes a single *-q4_k_m.gguf in out/merged/. Push it to your Ollama:

# === build a Modelfile referencing the merged GGUF ===
GGUF=$(ls out/merged/*.gguf | head -1)
cat > out/merged/Modelfile <<EOF
FROM $GGUF
PARAMETER temperature 0.4
PARAMETER top_p 0.9
SYSTEM "You are nibiru-coder, a senior PHP architect specialised in the Nibiru framework. Always cite file:line where relevant. Never use 'presumably', 'likely', 'appears to'."
EOF

# === register on Ollama at your-ollama-host.example ===
OLLAMA_BASE_URL=https://your-ollama-host.example
TAG="nibiru-coder:lora-1.0"
jq -n --arg name "$TAG" --rawfile mf out/merged/Modelfile \
   '{name: $name, modelfile: $mf, stream: false}' \
  | curl -sS -X POST "$OLLAMA_BASE_URL/api/create" \
      -H 'Content-Type: application/json' --data-binary @-

Verify:

curl -sS "$OLLAMA_BASE_URL/api/tags" | jq '.models[] | select(.name|startswith("nibiru-coder"))'

8. Switch the live docs Oracle to the new model

On the production host (bittomine@…), docs/.env controls which model the Oracle uses:

sed -i 's|^OLLAMA_CHAT_MODEL=.*|OLLAMA_CHAT_MODEL=nibiru-coder:lora-1.0|' docs/.env
docker compose up -d              # recreates the docs container with the new env

Hard-refresh https://nibiru-framework.com/en/ and ask the Oracle a Nibiru question — it should answer with file:line citations and stop guessing.

9. Iterate

If the model regresses or hallucinates on a topic:

Find the missing pattern in the framework reference.
Add it to docs/scripts/extraction/lora-augmentation.jsonl (one or two high-quality Q/A pairs).
Rebuild the corpus: cd docs && node scripts/build-corpus.mjs.
Bump the tag (nibiru-coder:lora-1.1) and re-run from step 5.

Keep the previous tag live until the new one is verified — Ollama keeps both, so flipping OLLAMA_CHAT_MODEL back is one env-var change away.

10 KiB Raw Blame History