---
title: RAG plugin
description: Ingest text, embed it, retrieve top-K, and answer grounded questions — all in one PHP class.
---

The RAG plugin is the AI module's killer feature for product builders. It turns any pile of text — your help docs, your error logs, your Stripe invoices, your customer-support tickets — into a queryable knowledge base in roughly four lines of PHP.

## Three minutes, end-to-end

```php
use Nibiru\Module\Ai\Ai;

$ai  = new Ai();
$rag = $ai->rag('product-help');     // a named collection

$rag->ingestDir(__DIR__ . '/help/'); // walks .md/.txt/.php under help/
$rag->ingestText('FAQ entry…', ['source' => 'faq-12']);

echo $rag->ask('How do I cancel my subscription?');
// → grounded answer, citing chunks like [1] [2] [3]
```

That's it. No vector DB. No SDK. No Python sidecar.

## How it works

```
ingestText / ingestFile / ingestDir
        ↓
   chunk → embed (Ollama nomic-embed-text)
        ↓
   pack vectors → JSON file at cache/rag/<collection>.json
        ↓
ask(question) → embed question → cosine top-K → chat with chunks as context
```

Storage is one JSON file per collection. Each chunk is an object with `text` + `metadata`; vectors are base64-packed Float32Array — about 3 KB per chunk. ~10k chunks fits comfortably in memory.

## Multiple collections

You can have any number of collections in the same app. Each has its own JSON file. They share embedding model and chat model from `[AI]` config.

```php
$docs    = $ai->rag('docs');
$tickets = $ai->rag('support-tickets');
$logs    = $ai->rag('error-logs');

$docs->ingestDir(__DIR__ . '/help/');
$tickets->ingestText($ticket->body, ['ticket_id' => $ticket->id]);
$logs->ingestText($exception->__toString(), ['ts' => time()]);
```

## API reference

```php
$rag = $ai->rag('name');                    // get/create a named collection

// --- Ingestion ---
$rag->ingestText($text, $metadata = []);    // single chunk
$count = $rag->ingestFile('path');          // returns chunks added
$count = $rag->ingestDir('dir', ['md','txt','php']); // recursive

// --- Querying ---
$hits = $rag->search('query', $k = null);   // [{score, text, metadata}, …]
$answer = $rag->ask('question', $k = null); // top-K → chat call

// --- Maintenance ---
$rag->reset();                              // forget everything (deletes file)
$n = $rag->size();                          // number of chunks
```

## Tuning knobs

In `application/module/ai/settings/ai.ini`:

```ini
[AI]
embed.model        = "nomic-embed-text"   ; or mxbai-embed-large for higher quality
rag.top_k          = 6                    ; chunks injected into the chat call
rag.chunk_target   = 600                  ; tokens per chunk (target)
rag.chunk_min      = 120                  ; smaller chunks merged
rag.chunk_max      = 900                  ; larger paragraphs split on sentences
rag.storage_path   = "/../../application/module/ai/cache/rag/"
```

## When to use it

- **Help / FAQ chat** — ingest your help articles, expose a `/ask` endpoint.
- **In-app code search** — ingest `application/module/`, ask "where do we calculate VAT?"
- **Internal docs assistant** — ingest your team's wiki dump.
- **Customer-history lookups** — ingest tickets, ask "have we seen this error before?"

## When NOT to use it

- **Real-time, write-heavy data** — RAG is a snapshot. For live data, write a [Tool](/en/ai/module/agent/) the agent can call.
- **Massive corpora (> 100k chunks)** — JSON-file storage starts to creak. Move to Qdrant / pgvector / Weaviate; we'll publish an adapter once we need one ourselves.
- **Anything where you need *exact* answers, not *probable* ones.** RAG is probabilistic. Don't use it as a database query layer.

## Common pitfalls

- **`nomic-embed-text` not pulled.** The first `ingestText` call will fail with a clear error pointing you at the pull command.
- **Embedding model mismatch.** Don't mix `nomic-embed-text` chunks with `mxbai-embed-large` queries — different vector spaces. If you change `embed.model`, run `$rag->reset()` first.
- **Stale collections.** Re-running ingestDir doesn't dedupe. Use `reset()` then re-ingest, or maintain a content-hash check yourself.
- **Tiny chunks.** Below ~80 tokens, embeddings get noisy. The default `rag.chunk_min = 120` merges small adjacent chunks.

## What's next

- [Agent plugin →](/en/ai/module/agent/) for tools, not retrieval.
- [Training nibiru-coder →](/en/ai/module/training/) to make the chat half answer in the framework's voice.