ML Repo — Architecture and External RAG Server Design (for Ollama/Open WebUI)
My openWebUI/searxng configs, plugins, RAG server, as well as a custom program that runs the AI's code in isolated Docker containers
Last updated: 2025-09-10
Summary :3
This repository wires together a local AI stack built around Open WebUI, Ollama, SearxNG, and two custom utilities: a code runner (executes model-generated code inside sandboxed containers) and a headless research browser UI. The current compose setup already gives you working RAG (retrieval-augmented generation) inside Open WebUI without needing a separate RAG service.
Repo map and how each piece fits
.
├─ docker-compose.yml
├─ searxng.yml # searxng settings; defaults, json+html enabled; not a public instance
├─ cloudflared-tunnel-config.yml # cloudflare tunnel routing to ollama, openwebui, and tools
├─ README.md
├─ LICENSE # apache-2.0
│
├─ rag-server/
│ ├─ Dockerfile # Runs the file that does the RAG stuff
│ └─ index.tsx # Does the RAG stuff
│
├─ browser/
│ └─ Dockerfile # builds browser-use/web-ui (playwright chromium) on :7788
|
└─ coderunner/
├─ Dockerfile # bun-based service that exposes an OpenAPI tool for sandboxed code exec
├─ index.ts # the server; integrates with Open WebUI as a tool via /openapi.json
└─ package.json # @types/node only (dev) to feed the OCD
Open WebUI (in docker-compose.yml)
- purpose: chat UI + orchestration layer; includes a built-in knowledge base + RAG with chunking, embedding, search, and prompt templating.
- notable: backed by Postgres in this compose. exposes
4000:8080. - storage: a docker volume
open-webui:holds app data; Postgres usespgdata:.
Postgres (in docker-compose.yml)
- purpose: persistence for Open WebUI features (users, knowledge, etc.). health-checked with
pg_isready.
SearxNG (in docker-compose.yml + searxng.yml)
- purpose: metasearch engine used by Open WebUI tools/agents for live web lookups.
- config highlights:
use_default_settings: true,public_instance: false,limiter: false; formats:htmlandjson.
Coderunner service (coderunner/)
- what it is: a small HTTP server (Bun runtime) that executes pure source code in short-lived, sandboxed containers.
- why it exists: lets Open WebUI tools run code safely with tight resource limits (no network, read-only fs, cgroup limits,
--cap-drop=ALL,no-new-privileges). - integration contract: exposes an OpenAPI schema at
/openapi.jsonand a single POST/executeendpoint. Open WebUI can import this as a tool server. - security posture: pulls allow-listed base images (gcc, python, node, bun, etc.), mounts only a tmpfs workdir, times out jobs ≈25s, and runs with non-root uid/gid. the container has access to the host’s docker socket only to run the sandbox containers.
Browser-use web-ui (browser/)
- purpose: “autonomous” research browser UI (chromium via playwright), reachable on
:7788. - built from upstream
browser-use/web-uirepo, with python deps and browsers installed in the image.
Cloudflared tunnel (cloudflared-tunnel-config.yml)
- maps hostnames (like
mlep.domain.comfor Ollama,owebui.domain.comfor Open WebUI, and atoolshost) to the internal services. useful for private, authenticated access without public inbound ports.
Why you currently don’t need an external RAG server
Open WebUI ships with first-class knowledge / RAG support: add files/URLs, it chunks + embeds, indexes, retrieves, and automatically prefixes retrieved context to the model prompt using a RAG template. for lightweight to mid-sized corpora and single-user/small-team usage, that’s often all you need.
Stay with built-in RAG if most of these are true:
- total corpus is ≤ ~100k chunks and grows slowly.
- single user or small team (no multi-tenant isolation needed).
- no special retrieval logic (hybrid lexical+semantic, rerankers, metadata filters) beyond what Open WebUI provides.
- tolerance for “UI-managed” knowledge; you don’t need programmatic ingestion pipelines or job queues.
When an external RAG server makes sense
Adopt a decoupled RAG service when you need one or more of:
- bigger data / throughput: millions of chunks, higher QPS, horizontal scaling.
- advanced retrieval: custom chunkers, hybrid search (bm25 + vector), reranking, time-decay, per-tenant filters, embeddings A/B, or multi-modal (image/audio) retrieval.
- programmatic ingestion: CI-driven pipelines from git/docs/confluence/S3; delta updates; background jobs.
- governance / isolation: strict multi-tenant separation, PII retention controls, audit trails.
- interoperability: a clean HTTP API and OpenAPI so other apps (beyond Open WebUI) can reuse your index.
External RAG Server — Design and Reference Implementation
This is a small, dependency-light service designed to run with Bun and integrate with both Ollama and Open WebUI.
Goals
- minimal moving parts; runs fine on a single host.
- uses Ollama for embeddings and chat.
- supports collections, upserts, queries, and an opinionated
/chatthat does retrieve-then-generate. - ships an OpenAPI so Open WebUI can import it as a tool server.
- default in-memory store (persisted to JSON) for simplicity; optional adapters for vector DBs later.
API surface
GET /openapi.json– schema for tool integration.POST /collections– create a logical collection{ name }.GET /collections– list collections.POST /upsert–{ collection, items:[{ id?, text, metadata? }] }; chunks+embeds text and stores vectors.POST /query–{ collection, query, topK?=5, where? }--> nearest chunks with scores.POST /chat–{ collection, query, topK?=5, model?, embedModel? }--> runs RAG and calls Ollama chat, returns the answer + citations.
Storage Strategy
- default: in-memory + JSON file on disk (
./data/rag.json). good for dev/small usage. - plug-in adapters: swap in Qdrant, SQLite-Vec, pgvector, Weaviate, etc., without changing the HTTP API.
Add to docker-compose.yml
rag:
build:
context: ./rag-server
dockerfile: Dockerfile
environment:
OLLAMA_BASE: "http://mlep.domain.com:11434"
OLLAMA_CHAT_MODEL: "llama3.1"
OLLAMA_EMBED_MODEL: "nomic-embed-text"
volumes:
- rag_data:/app/data
networks:
- internal
restart: unless-stopped
volumes:
rag_data:
if you already expose services via cloudflared, add another hostname mapping to the
ragcontainer (- hostname: rag.domain.com -> service: http://rag:8788).
Wiring the RAG server into Open WebUI and Ollama
1. Pull models
ollama pull nomic-embed-text(embeddings)ollama pull llama3.1(chat)
2. Expose the OpenAPI to Open WebUI as a tool server
- in Open WebUI --> settings --> tools --> add tool server
- paste the url for the cloudflared hostname
- you’ll now see tool functions like
listCollections,createCollection,upsert,query,chatavailable to the assistant
3. Usage pattern inside a chat
- to build a knowledge base, call the
createCollectionandupserttools with your documents - to answer, call
chatwhich performs retrieve-then-generate against your chosen collection
FAQ — Built-in vs. External RAG
Q: will Open WebUI’s built-in RAG conflict with this server? no — you can use either, or both. Open WebUI’s knowledge base is great for ad-hoc use. this service is for programmatic/control-plane needs or when you outgrow the UI’s storage/retrieval.
Q: how do enforce tenant isolation? use one collection per tenant and never mix. for stronger guarantees, run separate RAG instances or choose Qdrant with per-collection access control.
Q: how can use my chunker/reranker?
yes. place them ahead of /upsert and /query respectively, or add endpoints like /rerank and /embed to experiment.
Q: can this call OpenAI-compatible endpoints instead of native Ollama?
Ollama exposes an experimental OpenAI-compatible API. you can add a thin client if you already point tools at /v1/chat/completions.
License
This write-up and reference code are provided under the same Apache-2.0 terms as the repository.