# LoRA training runbook — Nibiru Step-by-step for fine-tuning a base model on the Nibiru training corpus, with live metrics in the Claude Sessions GUI Training tab. The corpus we ship at https://nibiru-framework.com/corpus/ is built by `docs/scripts/build-corpus.mjs` (see `docs/src/content/docs/en/ai/corpus.md`). This runbook trains a LoRA on top of `qwen2.5-coder:14b` and registers the result on the same Ollama at `your-ollama-host.example`. --- ## 1. Pre-flight on the GPU box ```sh ssh @ nvidia-smi # GPU visible, CUDA driver matches torch docker --version # 24+ recommended mkdir -p ~/training/nibiru && cd ~/training/nibiru ``` Pick the hardware-appropriate base model: | GPU | Base model | Quant | Effective memory | |--------------------|-------------------------|------------|------------------| | 24 GB (4090, 3090) | `Qwen2.5-Coder-7B` | 4-bit | ~14 GB | | 48 GB (A6000) | `Qwen2.5-Coder-14B` | 4-bit | ~22 GB | | 80 GB (A100, H100) | `Qwen2.5-Coder-32B` | 4-bit | ~38 GB | Pull the base model checkpoint to a host directory (one-time): ```sh mkdir -p ~/training/models huggingface-cli download Qwen/Qwen2.5-Coder-14B-Instruct \ --local-dir ~/training/models/qwen25-coder-14b ``` (Login first with `huggingface-cli login` if the model requires it.) --- ## 2. Pull the corpus The docs site exposes the corpus at `/corpus/*.jsonl`. We pull the chat format because the trainer config below uses HF's `chat_template`. ```sh cd ~/training/nibiru mkdir -p data curl -fLo data/chat.jsonl https://nibiru-framework.com/corpus/chat-en.jsonl curl -fLo data/manifest.json https://nibiru-framework.com/corpus/manifest.json # Verify the file integrity SHA=$(jq -r '.files[] | select(.filename=="chat-en.jsonl") | .sha256' data/manifest.json) test "$(sha256sum data/chat.jsonl | cut -d' ' -f1)" = "$SHA" && echo "ok" || echo "MISMATCH" ``` For a multilingual LoRA pull `chat-all.jsonl` instead. Keep one language per training run unless you have a reason to mix — it makes the loss curve cleaner and the resulting model less prone to language-leakage. --- ## 3. Training container — `Dockerfile` + `compose.yml` The Sessions GUI dashboard polls the box via SSH and runs `docker logs` on the container, parsing the tqdm progress + `'loss'` lines. So the container needs to run inside Docker (not bare-metal), under a stable name, with logs streamed to stdout. ```sh cd ~/training/nibiru ``` **`Dockerfile`** (copy verbatim): ```dockerfile # syntax=docker/dockerfile:1.6 FROM nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04 ENV DEBIAN_FRONTEND=noninteractive PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1 RUN apt-get update && apt-get install -y --no-install-recommends \ python3 python3-pip git ca-certificates && rm -rf /var/lib/apt/lists/* RUN pip3 install --upgrade pip && pip3 install \ "torch==2.4.*" --index-url https://download.pytorch.org/whl/cu124 RUN pip3 install \ "unsloth[cu124-torch240]==2024.10.7" \ "transformers==4.46.*" \ "datasets==3.0.*" \ "trl==0.11.*" \ "peft==0.13.*" \ "accelerate==1.0.*" \ "bitsandbytes==0.44.*" WORKDIR /workspace COPY train.py ./train.py CMD ["python3", "-u", "train.py"] ``` **`compose.yml`**: ```yaml services: trainer: build: . container_name: nibiru-trainer restart: "no" runtime: nvidia environment: NVIDIA_VISIBLE_DEVICES: all NVIDIA_DRIVER_CAPABILITIES: compute,utility volumes: - ./data:/workspace/data:ro - ./out:/workspace/out - ~/training/models:/workspace/models:ro shm_size: '8gb' ``` --- ## 4. Training script — `train.py` ```python """Nibiru LoRA — single-GPU unsloth, chat-template, HF Trainer. Outputs: out/nibiru-lora/ adapter_hf/ # the LoRA adapter (PEFT format) trainer_state.json # epochs, losses, eval — what the dashboard reads checkpoint-*/ # periodic checkpoints logs/ # tensorboard event files """ import os, json from datasets import load_dataset from trl import SFTConfig, SFTTrainer from unsloth import FastLanguageModel from unsloth.chat_templates import get_chat_template BASE_MODEL = os.environ.get("BASE_MODEL", "/workspace/models/qwen25-coder-14b") TRAIN_FILE = os.environ.get("TRAIN_FILE", "/workspace/data/chat.jsonl") OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "/workspace/out/nibiru-lora") MAX_SEQ_LEN = int(os.environ.get("MAX_SEQ_LEN", "4096")) # 1. Load model + 4-bit quant + LoRA adapter on top model, tokenizer = FastLanguageModel.from_pretrained( model_name = BASE_MODEL, max_seq_length = MAX_SEQ_LEN, load_in_4bit = True, ) model = FastLanguageModel.get_peft_model( model, r = 16, target_modules = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"], lora_alpha = 32, lora_dropout = 0.05, bias = "none", use_gradient_checkpointing = "unsloth", ) tokenizer = get_chat_template(tokenizer, chat_template="qwen-2.5") # 2. Load corpus, format as chat with the tokenizer's template ds = load_dataset("json", data_files=TRAIN_FILE, split="train") def fmt(rec): msgs = rec["messages"] return {"text": tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)} ds = ds.map(fmt, num_proc=4, remove_columns=ds.column_names) # 3. Train trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = ds, args = SFTConfig( output_dir = OUTPUT_DIR, max_seq_length = MAX_SEQ_LEN, per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_ratio = 0.03, num_train_epochs = 3, learning_rate = 2e-4, bf16 = True, logging_steps = 5, save_steps = 200, save_total_limit = 3, report_to = "none", dataset_text_field = "text", packing = True, # === trainer_state === ← marker the Sessions GUI dashboard greps for ), ) trainer.train() # 4. Save adapter in HF (PEFT) format — what the merge step expects model.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf")) tokenizer.save_pretrained(os.path.join(OUTPUT_DIR, "adapter_hf")) print("\\n=== training complete ===") ``` The literal comment `=== trainer_state ===` near the SFTConfig is the marker the dashboard uses to fall back to reading `trainer_state.json` after the container exits — see `dashboard/server.py:_extract_trainer_state`. Don't remove it. --- ## 5. Run it ```sh cd ~/training/nibiru docker compose build docker compose up -d docker logs -f nibiru-trainer ``` Expected log shape (this is what the Sessions GUI parses): ``` {'loss': '1.2345', 'grad_norm': '...', 'learning_rate': ..., 'epoch': '0.05'} 12%|## | 60/500 [01:23<10:14, 1.40s/it] ``` --- ## 6. Open the Sessions GUI Training tab In the GUI, click **Training**. Fill the strip at the top of the tab: | Field | Value | |-----------------|----------------------------------------------| | host | `@` | | container | `nibiru-trainer` | | project_dir | `/home//training/nibiru` | | output_subdir | `out/nibiru-lora` | | adapter_filename| `adapter_hf/adapter_model.safetensors` | The dashboard remembers the last config in `~/.config/training-monitor/last.json`, so after the first save you just open the tab and the live metrics appear. While the container is running you'll see: - progress bar (% / step / total / ETA — parsed from tqdm) - last 30 loss values + epoch - the live `docker ps` line for the container After training ends and the container exits, the dashboard falls back to `trainer_state.json` in `out/nibiru-lora/` so the final losses + step count stay visible. --- ## 7. Merge LoRA into the base + register on Ollama ```sh # On the GPU box, still inside ~/training/nibiru/ docker run --rm --runtime=nvidia \ -v $PWD/out/nibiru-lora:/lora:ro \ -v ~/training/models/qwen25-coder-14b:/base:ro \ -v $PWD/out/merged:/out \ nibiru-trainer \ python3 -c " from unsloth import FastLanguageModel m, tok = FastLanguageModel.from_pretrained('/base', load_in_4bit=False) m.load_adapter('/lora/adapter_hf', adapter_name='nibiru') m.merge_and_unload() m.save_pretrained_gguf('/out', tok, quantization_method='q4_k_m') " ``` This writes a single `*-q4_k_m.gguf` in `out/merged/`. Push it to your Ollama: ```sh # === build a Modelfile referencing the merged GGUF === GGUF=$(ls out/merged/*.gguf | head -1) cat > out/merged/Modelfile <