Files
nibiru-framework.com/docs/src/content/docs/en/ai/module/rag.md
stephan 48c839d927 Initial public push: docs cosmos v4 + AI module + framework groundwork
This is the snapshot the production landing site (nibiru-framework.com) is
deployed from. Brings together the recent splash + docs migration to the v4
"Cosmos" design system, the new in-framework AI module, and the framework
groundwork that backs the framework-reference extraction.

What lands:
- docs/: Astro + Starlight site with the v4 dark cosmic palette, GalaxyHero
  canvas constellation, Mission Control chat (wired to /api/oracle →
  api.neuronetz.ai via providers.mjs Ollama), 5-panel MMVC stage
  (Model · AI · Module · Controller · View), translated EN/DE/JA/ES/FR
  content, PWA + sitemap + llms.txt + Umami analytics.
- docs/design-system/: canonical mockup bundle (source/index-v2.html for
  splash, source/docs-system.html + preview/ for docs, SPEC.md, tokens).
- docs/scripts/extraction/framework-reference-v2.md: deep framework
  reference (~1.6k lines, file:line citations, every public factory and
  idiom — basis for the LoRA training corpus.
- application/module/ai/: AI module with chat / embed / RAG / agent
  plugins, plus pdoQuery / httpGet / fileRead tools and Modelfile +
  smoke-test in training/.
- application/module/users/: user / ACL / form-factory traits used as the
  reference plugin pattern for the framework docs.
- application/settings/config/database/: schema + seed migrations
  including the AI module tables (200–203).
- Form factory + autogenerator changes the framework-reference-v2 covers.

Production secrets stay out: docs/.env, settings.production.ini and
ai.production.ini are all gitignored (.example files are in tree).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:22:18 +02:00

4.4 KiB

title, description
title description
RAG plugin Ingest text, embed it, retrieve top-K, and answer grounded questions — all in one PHP class.

The RAG plugin is the AI module's killer feature for product builders. It turns any pile of text — your help docs, your error logs, your Stripe invoices, your customer-support tickets — into a queryable knowledge base in roughly four lines of PHP.

Three minutes, end-to-end

use Nibiru\Module\Ai\Ai;

$ai  = new Ai();
$rag = $ai->rag('product-help');     // a named collection

$rag->ingestDir(__DIR__ . '/help/'); // walks .md/.txt/.php under help/
$rag->ingestText('FAQ entry…', ['source' => 'faq-12']);

echo $rag->ask('How do I cancel my subscription?');
// → grounded answer, citing chunks like [1] [2] [3]

That's it. No vector DB. No SDK. No Python sidecar.

How it works

ingestText / ingestFile / ingestDir
        ↓
   chunk → embed (Ollama nomic-embed-text)
        ↓
   pack vectors → JSON file at cache/rag/<collection>.json
        ↓
ask(question) → embed question → cosine top-K → chat with chunks as context

Storage is one JSON file per collection. Each chunk is an object with text + metadata; vectors are base64-packed Float32Array — about 3 KB per chunk. ~10k chunks fits comfortably in memory.

Multiple collections

You can have any number of collections in the same app. Each has its own JSON file. They share embedding model and chat model from [AI] config.

$docs    = $ai->rag('docs');
$tickets = $ai->rag('support-tickets');
$logs    = $ai->rag('error-logs');

$docs->ingestDir(__DIR__ . '/help/');
$tickets->ingestText($ticket->body, ['ticket_id' => $ticket->id]);
$logs->ingestText($exception->__toString(), ['ts' => time()]);

API reference

$rag = $ai->rag('name');                    // get/create a named collection

// --- Ingestion ---
$rag->ingestText($text, $metadata = []);    // single chunk
$count = $rag->ingestFile('path');          // returns chunks added
$count = $rag->ingestDir('dir', ['md','txt','php']); // recursive

// --- Querying ---
$hits = $rag->search('query', $k = null);   // [{score, text, metadata}, …]
$answer = $rag->ask('question', $k = null); // top-K → chat call

// --- Maintenance ---
$rag->reset();                              // forget everything (deletes file)
$n = $rag->size();                          // number of chunks

Tuning knobs

In application/module/ai/settings/ai.ini:

[AI]
embed.model        = "nomic-embed-text"   ; or mxbai-embed-large for higher quality
rag.top_k          = 6                    ; chunks injected into the chat call
rag.chunk_target   = 600                  ; tokens per chunk (target)
rag.chunk_min      = 120                  ; smaller chunks merged
rag.chunk_max      = 900                  ; larger paragraphs split on sentences
rag.storage_path   = "/../../application/module/ai/cache/rag/"

When to use it

  • Help / FAQ chat — ingest your help articles, expose a /ask endpoint.
  • In-app code search — ingest application/module/, ask "where do we calculate VAT?"
  • Internal docs assistant — ingest your team's wiki dump.
  • Customer-history lookups — ingest tickets, ask "have we seen this error before?"

When NOT to use it

  • Real-time, write-heavy data — RAG is a snapshot. For live data, write a Tool the agent can call.
  • Massive corpora (> 100k chunks) — JSON-file storage starts to creak. Move to Qdrant / pgvector / Weaviate; we'll publish an adapter once we need one ourselves.
  • Anything where you need exact answers, not probable ones. RAG is probabilistic. Don't use it as a database query layer.

Common pitfalls

  • nomic-embed-text not pulled. The first ingestText call will fail with a clear error pointing you at the pull command.
  • Embedding model mismatch. Don't mix nomic-embed-text chunks with mxbai-embed-large queries — different vector spaces. If you change embed.model, run $rag->reset() first.
  • Stale collections. Re-running ingestDir doesn't dedupe. Use reset() then re-ingest, or maintain a content-hash check yourself.
  • Tiny chunks. Below ~80 tokens, embeddings get noisy. The default rag.chunk_min = 120 merges small adjacent chunks.

What's next