A complete RAG system for the open web: search engine + site scraper in one platform. Use context.txt when available for deterministic answers, and fallback scraping when it is not.
A single text file gives AI agents everything they need. No scraping, no parsing, no wasted tokens.
Add /context.txt to your domain or upload structured context.txt from the dashboard. URL hosting is optional.
yourdomain.com/context.txtSubmit your domain and we index every section. Sites without context.txt get auto-scraped via AI.
POST /submitAI agents query structured context in <30ms. MCP server, REST API, or n8n node.
GET /search?q=...<30ms
Response time
vs ~200-500ms traditional RAG
~2K
Tokens consumed
vs high token use in full pipelines
High
Signal quality
publisher-curated content
Any site
Indexing
context.txt or AI fallback
ProtoContext is better when knowledge is structured and deterministic: products, prices, policies, docs, schedules. You get low-latency, exact answers from publisher-authored context. Traditional RAG still wins for massive unstructured corpora and deep semantic discovery.
ProtoContext does not replace RAG; it makes it optional. Use ProtoContext as the deterministic fast path, then enable semantic search only when you actually need it. This keeps latency low while preserving a hybrid path for complex queries.
Structured first. Semantic when needed.
Publish one context file per language (for example /context.en.txt and /context.it.txt), while /context.txt keeps working exactly as-is for backward compatibility.
Initial language support (extra):
PCE extends context.txt without breaking compatibility, so agents can understand what a site sells, answer FAQs consistently, and trigger safe actions.
PRODUCT_ID: sku_9876 NAME: Espresso Machine Pro CATEGORY: kitchen_appliances DETAILS_URL: https://shop.com/products/espresso-machine-pro PURCHASE_URL: https://shop.com/checkout?product=sku_9876
ROOM_TYPE: superior_double NAME: Superior Double Room OCCUPANCY: 1-2 FEATURES: balcony, air conditioning, Wi-Fi DETAILS_URL: https://hotel.com/rooms/superior BOOKING_URL: https://booking.hotel.com?room=superior
TOUR_ID: tour_vespa_roma NAME: Vespa Tour Roma DURATION: 3 hours LANGUAGES: en, it, es DETAILS_URL: https://experiences.com/tours/vespa-roma BOOKING_URL: https://experiences.com/book/tour_vespa_roma
All endpoints available at your engine URL. Authenticate with x-proto-token header.
/searchFull-text search across all indexed sites
qstringrequiredSearch querydomainstringFilter by domainlimitintMax results (default 10)curl "http://localhost:8000/search?q=payments&limit=5" \ -H "x-proto-token: YOUR_TOKEN"
With AI provider:
curl "http://localhost:8000/search?q=payments" \ -H "x-proto-token: YOUR_TOKEN" \ -H "x-ai-key: YOUR_KEY" -H "x-ai-model: gemini/gemini-3-flash-preview"
/siteGet all context sections for a domain
domainstringrequiredDomain to retrievecurl "http://localhost:8000/site?domain=stripe.com" \ -H "x-proto-token: YOUR_TOKEN"
/submitSubmit a new domain to the index
domainstringrequiredDomain to registerai_keystringAI provider keyai_modelstringModel in provider/name formatcurl -X POST http://localhost:8000/submit \
-H "Content-Type: application/json" \
-H "x-proto-token: YOUR_TOKEN" \
-d '{"domain": "example.com"}'/deleteRemove a domain from the index
domainstringrequiredDomain to deletecurl -X POST http://localhost:8000/delete \
-H "Content-Type: application/json" \
-H "x-proto-token: YOUR_TOKEN" \
-d '{"domain": "example.com"}'/batchMultiple search queries in one request
queriesarrayrequiredArray of {q, domain?, limit?}curl -X POST http://localhost:8000/batch \
-H "Content-Type: application/json" \
-H "x-proto-token: YOUR_TOKEN" \
-d '{"queries": [{"q": "payments"}, {"q": "docs", "domain": "stripe.com"}]}'/statsIndex statistics
curl http://localhost:8000/stats \ -H "x-proto-token: YOUR_TOKEN"
/healthHealth check
curl http://localhost:8000/health
All protected endpoints require the x-proto-token header. Tokens are generated during setup or login.
/auth/statusCheck authentication status and mode
curl http://localhost:8000/auth/status
/auth/setupCreate admin account (first run only)
namestringrequiredAdmin nameemailstringrequiredAdmin emailpasswordstringrequiredMin 8 characterscurl -X POST http://localhost:8000/auth/setup \
-H "Content-Type: application/json" \
-d '{"name": "Admin", "email": "admin@example.com", "password": "securepass"}'/auth/loginSign in and receive session token
emailstringrequiredAccount emailpasswordstringrequiredAccount passwordcurl -X POST http://localhost:8000/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "admin@example.com", "password": "securepass"}'/auth/logoutInvalidate current session
curl -X POST http://localhost:8000/auth/logout \ -H "x-proto-token: YOUR_TOKEN"
Gemini
gemini/gemini-3-flash-previewOpenAI
openai/gpt-4o-miniOpenRouter
openrouter/google/gemini-3-flash-previewCreate a context.txt file, publish it on your domain, and your site is instantly readable by AI agents worldwide. No SDK, no API key, no signup.
# Roma Coffee Shop > Online store selling coffee machines and specialty beans @lang: en @version: 1.0 @updated: 2026-02-24 @topics: ecommerce, coffee, kitchen_appliances @content_type: ecommerce @location: Italy ## section: Product Catalog PRODUCT_ID: sku_9876 NAME: Espresso Machine Pro CATEGORY: kitchen_appliances DETAILS_URL: https://shop.com/products/espresso-machine-pro PURCHASE_URL: https://shop.com/checkout?product=sku_9876
Copy this prompt into any AI (ChatGPT, Claude, Gemini) and paste your website content. It will output a properly formatted /context.txt plus optional language files like /context.es.txt and /context.en.txt, including simple PCE examples for PRODUCT_ID, ROOM_TYPE, and TOUR_ID.
You are a context file generator for the ProtoContext standard.
Your job is to convert website content (about pages, docs, catalog text, policies, FAQs) into clean AI-readable context files.
Output requirements:
- Always generate /context.txt
- If the source is multilingual, also generate /context.{lang}.txt files (example: /context.en.txt, /context.es.txt)
- Keep /context.txt backward compatible
- Supported language codes (initial): en, es, fr, it, de, pt, pl, zh, fi, sv, no, da, ja
Follow this exact format:
---
# Site Name
> One-line description of what this site/product/company does
@lang: [language code, e.g. en, es, fr]
@version: 1.0
@updated: [today's date in YYYY-MM-DD]
@topics: [comma-separated relevant topics]
@content_type: [optional, e.g. ecommerce, hospitality, tours]
@location: [optional city/country]
## section: About
[Plain text, concise, factual. Written for AI agents, not for marketing.]
## section: [Normalized Section Title]
[More plain factual content...]
---
If relevant, include repeatable structured blocks (ProtoContextExtension / PCE):
E-commerce example:
PRODUCT_ID: sku_9876
NAME: Espresso Machine Pro
CATEGORY: kitchen_appliances
DETAILS_URL: https://shop.com/products/espresso-machine-pro
PURCHASE_URL: https://shop.com/checkout?product=sku_9876
Hotel example:
ROOM_TYPE: superior_double
NAME: Superior Double Room
OCCUPANCY: 1-2
FEATURES: balcony, air conditioning, Wi-Fi
DETAILS_URL: https://hotel.com/rooms/superior
BOOKING_URL: https://booking.hotel.com?room=superior
Tour example:
TOUR_ID: tour_vespa_roma
NAME: Vespa Tour Roma
DURATION: 3 hours
LANGUAGES: en, it, es
DETAILS_URL: https://experiences.com/tours/vespa-roma
BOOKING_URL: https://experiences.com/book/tour_vespa_roma
Rules:
1. Each section starts with "## section: " followed by the title
2. Keep sections focused — one topic per section
3. Write in plain, factual language optimized for AI consumption
4. Remove HTML, markdown links, images, and formatting artifacts
5. Include metadata (@lang, @version, @updated, @topics) at the top
6. The first line is "# Site Name" followed by "> description"
7. Typical sections: About, Products/Services, Pricing, API, Documentation, Contact, FAQ
8. Keep each section under 500 words — be concise
9. Do NOT include navigation elements, footers, cookie notices, or UI text
10. Keep URLs explicit when actions are available (DETAILS_URL + PURCHASE_URL or BOOKING_URL)
11. Output plain text only, no markdown code fences
12. If multilingual output is requested, separate each file with a clear filename heading:
FILE: /context.en.txt
[file content]
Now convert the following website content into context files:Copy the prompt, paste it into any AI, then paste your website content after it. Publish the output as yourdomain.com/context.txt