Files
nibiru-framework.com/docs/public/robots.txt
stephan 47bd4bf53f Expose /sitemap.xml for Bing + IndexNow
Astro's sitemap integration emits /sitemap-index.xml + /sitemap-0.xml. Bing
Webmaster Tools and IndexNow probe /sitemap.xml literally, so a request
for the canonical name was 404'ing. Two changes:

- astro.config.mjs: add a 301 redirect /sitemap.xml → /sitemap-index.xml
  (alongside the existing / → /en/ redirect)
- public/robots.txt: list both Sitemap URLs so any crawler that reads
  robots.txt finds an entry it can use directly

After production redeploys (`docker compose up -d --build`), submit
https://nibiru-framework.com/sitemap.xml in Bing Webmaster Tools — it'll
follow the 301 and ingest the index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:42:22 +02:00

124 lines
2.8 KiB
Plaintext

# =============================================================================
# robots.txt for nibiru-framework.com
#
# Policy: open. We want every search engine, every AI training crawler,
# every retrieval/RAG agent to be able to read these docs. The whole point
# of publishing this site is so that humans AND models can learn Nibiru.
#
# Wildcard rule below allows everything; AI-specific bots are listed
# explicitly so their operators can verify they are welcome here.
# =============================================================================
# ── Search engines ──────────────────────────────────────────────────────────
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: DuckDuckBot
Allow: /
User-agent: Yandexbot
Allow: /
User-agent: Baiduspider
Allow: /
# ── AI training / search crawlers — explicitly welcomed ─────────────────────
# OpenAI
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
# Anthropic
User-agent: ClaudeBot
Allow: /
User-agent: Claude-Web
Allow: /
User-agent: anthropic-ai
Allow: /
# Google AI training
User-agent: Google-Extended
Allow: /
# Apple AI training
User-agent: Applebot-Extended
Allow: /
User-agent: Applebot
Allow: /
# Meta
User-agent: meta-externalagent
Allow: /
User-agent: FacebookBot
Allow: /
# Perplexity
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
# Other AI / LLM crawlers
User-agent: YouBot
Allow: /
User-agent: Bytespider
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Diffbot
Allow: /
User-agent: cohere-ai
Allow: /
User-agent: cohere-training-data-crawler
Allow: /
User-agent: Mistral-AI-User
Allow: /
User-agent: omgili
Allow: /
User-agent: omgilibot
Allow: /
# Common Crawl — the dataset most LLMs train on
User-agent: CCBot
Allow: /
# Internet Archive
User-agent: ia_archiver
Allow: /
# ── Default policy: allow everything ───────────────────────────────────────
User-agent: *
Allow: /
# Don't index or crawl the SSR API endpoint — it's not content.
Disallow: /api/
# ── Sitemaps ───────────────────────────────────────────────────────────────
# /sitemap.xml is a 301 to /sitemap-index.xml (Astro emits the index
# automatically + one child sitemap-0.xml). Both URLs are listed so any
# crawler that probes either path lands on the same content. Bing's
# Webmaster Tools and IndexNow tend to look for /sitemap.xml literally.
Sitemap: https://nibiru-framework.com/sitemap.xml
Sitemap: https://nibiru-framework.com/sitemap-index.xml