2025-09-10 17:34:22 -04:00
2025-09-10 16:26:35 -04:00
2025-09-10 16:26:35 -04:00
2025-09-10 17:34:22 -04:00
2025-09-10 16:14:55 -04:00
2025-09-10 16:14:55 -04:00
2025-09-10 17:34:22 -04:00
2025-09-10 16:26:35 -04:00

ML Repo — Architecture and External RAG Server Design (for Ollama/Open WebUI)

My openWebUI/searxng configs, plugins, RAG server, as well as a custom program that runs the AI's code in isolated Docker containers

Last updated: 2025-09-10


Summary :3

This repository wires together a local AI stack built around Open WebUI, Ollama, SearxNG, and two custom utilities: a code runner (executes model-generated code inside sandboxed containers) and a headless research browser UI. The current compose setup already gives you working RAG (retrieval-augmented generation) inside Open WebUI without needing a separate RAG service.


Repo map and how each piece fits

.
├─ docker-compose.yml
├─ searxng.yml                  # searxng settings; defaults, json+html enabled; not a public instance
├─ cloudflared-tunnel-config.yml # cloudflare tunnel routing to ollama, openwebui, and tools
├─ README.md
├─ LICENSE                      # apache-2.0
│
├─ rag-server/
│  ├─ Dockerfile                # Runs the file that does the RAG stuff
│  └─ index.tsx                 # Does the RAG stuff
│
├─ browser/
│  └─ Dockerfile                # builds browser-use/web-ui (playwright chromium) on :7788
|
└─ coderunner/
   ├─ Dockerfile                # bun-based service that exposes an OpenAPI tool for sandboxed code exec
   ├─ index.ts                  # the server; integrates with Open WebUI as a tool via /openapi.json
   └─ package.json              # @types/node only (dev) to feed the OCD

Open WebUI (in docker-compose.yml)

  • purpose: chat UI + orchestration layer; includes a built-in knowledge base + RAG with chunking, embedding, search, and prompt templating.
  • notable: backed by Postgres in this compose. exposes 4000:8080.
  • storage: a docker volume open-webui: holds app data; Postgres uses pgdata:.

Postgres (in docker-compose.yml)

  • purpose: persistence for Open WebUI features (users, knowledge, etc.). health-checked with pg_isready.

SearxNG (in docker-compose.yml + searxng.yml)

  • purpose: metasearch engine used by Open WebUI tools/agents for live web lookups.
  • config highlights: use_default_settings: true, public_instance: false, limiter: false; formats: html and json.

Coderunner service (coderunner/)

  • what it is: a small HTTP server (Bun runtime) that executes pure source code in short-lived, sandboxed containers.
  • why it exists: lets Open WebUI tools run code safely with tight resource limits (no network, read-only fs, cgroup limits, --cap-drop=ALL, no-new-privileges).
  • integration contract: exposes an OpenAPI schema at /openapi.json and a single POST /execute endpoint. Open WebUI can import this as a tool server.
  • security posture: pulls allow-listed base images (gcc, python, node, bun, etc.), mounts only a tmpfs workdir, times out jobs ≈25s, and runs with non-root uid/gid. the container has access to the hosts docker socket only to run the sandbox containers.

Browser-use web-ui (browser/)

  • purpose: “autonomous” research browser UI (chromium via playwright), reachable on :7788.
  • built from upstream browser-use/web-ui repo, with python deps and browsers installed in the image.

Cloudflared tunnel (cloudflared-tunnel-config.yml)

  • maps hostnames (like mlep.domain.com for Ollama, owebui.domain.com for Open WebUI, and a tools host) to the internal services. useful for private, authenticated access without public inbound ports.

Why you currently dont need an external RAG server

Open WebUI ships with first-class knowledge / RAG support: add files/URLs, it chunks + embeds, indexes, retrieves, and automatically prefixes retrieved context to the model prompt using a RAG template. for lightweight to mid-sized corpora and single-user/small-team usage, thats often all you need.

Stay with built-in RAG if most of these are true:

  • total corpus is ≤ ~100k chunks and grows slowly.
  • single user or small team (no multi-tenant isolation needed).
  • no special retrieval logic (hybrid lexical+semantic, rerankers, metadata filters) beyond what Open WebUI provides.
  • tolerance for “UI-managed” knowledge; you dont need programmatic ingestion pipelines or job queues.

When an external RAG server makes sense

Adopt a decoupled RAG service when you need one or more of:

  • bigger data / throughput: millions of chunks, higher QPS, horizontal scaling.
  • advanced retrieval: custom chunkers, hybrid search (bm25 + vector), reranking, time-decay, per-tenant filters, embeddings A/B, or multi-modal (image/audio) retrieval.
  • programmatic ingestion: CI-driven pipelines from git/docs/confluence/S3; delta updates; background jobs.
  • governance / isolation: strict multi-tenant separation, PII retention controls, audit trails.
  • interoperability: a clean HTTP API and OpenAPI so other apps (beyond Open WebUI) can reuse your index.

External RAG Server — Design and Reference Implementation

This is a small, dependency-light service designed to run with Bun and integrate with both Ollama and Open WebUI.

Goals

  • minimal moving parts; runs fine on a single host.
  • uses Ollama for embeddings and chat.
  • supports collections, upserts, queries, and an opinionated /chat that does retrieve-then-generate.
  • ships an OpenAPI so Open WebUI can import it as a tool server.
  • default in-memory store (persisted to JSON) for simplicity; optional adapters for vector DBs later.

API surface

  • GET /openapi.json schema for tool integration.
  • POST /collections create a logical collection { name }.
  • GET /collections list collections.
  • POST /upsert { collection, items:[{ id?, text, metadata? }] }; chunks+embeds text and stores vectors.
  • POST /query { collection, query, topK?=5, where? } --> nearest chunks with scores.
  • POST /chat { collection, query, topK?=5, model?, embedModel? } --> runs RAG and calls Ollama chat, returns the answer + citations.

Storage Strategy

  • default: in-memory + JSON file on disk (./data/rag.json). good for dev/small usage.
  • plug-in adapters: swap in Qdrant, SQLite-Vec, pgvector, Weaviate, etc., without changing the HTTP API.

Add to docker-compose.yml

  rag:
    build:
      context: ./rag-server
      dockerfile: Dockerfile
    environment:
      OLLAMA_BASE: "http://mlep.domain.com:11434"
      OLLAMA_CHAT_MODEL: "llama3.1"
      OLLAMA_EMBED_MODEL: "nomic-embed-text"
    volumes:
      - rag_data:/app/data
    networks:
      - internal
    restart: unless-stopped

volumes:
  rag_data:

if you already expose services via cloudflared, add another hostname mapping to the rag container (- hostname: rag.domain.com -> service: http://rag:8788).


Wiring the RAG server into Open WebUI and Ollama

1. Pull models

  • ollama pull nomic-embed-text (embeddings)
  • ollama pull llama3.1 (chat)

2. Expose the OpenAPI to Open WebUI as a tool server

  • in Open WebUI --> settings --> tools --> add tool server
  • paste the url for the cloudflared hostname
  • youll now see tool functions like listCollections, createCollection, upsert, query, chat available to the assistant

3. Usage pattern inside a chat

  • to build a knowledge base, call the createCollection and upsert tools with your documents
  • to answer, call chat which performs retrieve-then-generate against your chosen collection

FAQ — Built-in vs. External RAG

Q: will Open WebUIs built-in RAG conflict with this server? no — you can use either, or both. Open WebUIs knowledge base is great for ad-hoc use. this service is for programmatic/control-plane needs or when you outgrow the UIs storage/retrieval.

Q: how do enforce tenant isolation? use one collection per tenant and never mix. for stronger guarantees, run separate RAG instances or choose Qdrant with per-collection access control.

Q: how can use my chunker/reranker? yes. place them ahead of /upsert and /query respectively, or add endpoints like /rerank and /embed to experiment.

Q: can this call OpenAI-compatible endpoints instead of native Ollama? Ollama exposes an experimental OpenAI-compatible API. you can add a thin client if you already point tools at /v1/chat/completions.


License

This write-up and reference code are provided under the same Apache-2.0 terms as the repository.

S
Description
My openWebUI/searxng configs, plugins, RAG server, as well as a custom program that runs the AI's code in isolated Docker containers
https://owebui.ion606.com
Readme Apache-2.0 350 KiB
Languages
TypeScript 61.7%
Python 30.1%
Dockerfile 8.2%