189 lines
8.6 KiB
Markdown
189 lines
8.6 KiB
Markdown
# ML Repo — Architecture and External RAG Server Design (for Ollama/Open WebUI)
|
||
|
||
My openWebUI/searxng configs, plugins, RAG server, as well as a custom program that runs the AI's code in isolated Docker containers
|
||
|
||
*Last updated: 2025-09-13*
|
||
|
||
> [!TIP]
|
||
> Looking for the compose version of this? See the [compose branch](https://git.ion606.com/ION606/ollama-plus/src/branch/compose/)
|
||
|
||
---
|
||
|
||
## Summary :3
|
||
|
||
This repository wires together a local AI stack built around **Open WebUI**, **Ollama**, **SearxNG**, and two custom utilities: a **code runner** (executes model-generated code inside sandboxed containers) and a **headless research browser UI**. The current compose setup already gives you working RAG (retrieval-augmented generation) **inside Open WebUI** without needing a separate RAG service.
|
||
|
||
---
|
||
|
||
## Repo map and how each piece fits
|
||
|
||
```sh
|
||
.
|
||
├─ docker-compose.yml
|
||
├─ searxng.yml # searxng settings; defaults, json+html enabled; not a public instance
|
||
├─ cloudflared-tunnel-config.yml # cloudflare tunnel routing to ollama, openwebui, and tools
|
||
├─ README.md
|
||
├─ LICENSE # apache-2.0
|
||
│
|
||
├─ rag-server/
|
||
│ ├─ Dockerfile # Runs the file that does the RAG stuff
|
||
│ └─ index.tsx # Does the RAG stuff
|
||
│
|
||
├─ browser/
|
||
│ └─ Dockerfile # builds browser-use/web-ui (playwright chromium) on :7788
|
||
|
|
||
└─ coderunner/
|
||
├─ Dockerfile # bun-based service that exposes an OpenAPI tool for sandboxed code exec
|
||
├─ index.ts # the server; integrates with Open WebUI as a tool via /openapi.json
|
||
└─ package.json # @types/node only (dev) to feed the OCD
|
||
```
|
||
|
||
### Open WebUI (in `docker-compose.yml`)
|
||
|
||
* purpose: chat UI + orchestration layer; **includes a built-in knowledge base + RAG** with chunking, embedding, search, and prompt templating.
|
||
* notable: backed by Postgres in this compose. exposes `4000:8080`.
|
||
* storage: a docker volume `open-webui:` holds app data; Postgres uses `pgdata:`.
|
||
|
||
### Postgres (in `docker-compose.yml`)
|
||
|
||
* purpose: persistence for Open WebUI features (users, knowledge, etc.). health-checked with `pg_isready`.
|
||
|
||
### SearxNG (in `docker-compose.yml` + `searxng.yml`)
|
||
|
||
* purpose: metasearch engine used by Open WebUI tools/agents for live web lookups.
|
||
* config highlights: `use_default_settings: true`, `public_instance: false`, `limiter: false`; formats: `html` and `json`.
|
||
|
||
### Coderunner service (`coderunner/`)
|
||
|
||
* **what it is:** a small HTTP server (Bun runtime) that executes pure source code in short-lived, sandboxed containers.
|
||
* **why it exists:** lets Open WebUI tools run code safely with tight resource limits (no network, read-only fs, cgroup limits, `--cap-drop=ALL`, `no-new-privileges`).
|
||
* **integration contract:** exposes an **OpenAPI schema at `/openapi.json`** and a single POST `/execute` endpoint. Open WebUI can import this as a **tool server**.
|
||
* **security posture:** pulls allow-listed base images (gcc, python, node, bun, etc.), mounts only a tmpfs workdir, times out jobs ≈25s, and runs with non-root uid/gid. The container has access to the host’s docker socket *only* to run the sandbox containers.
|
||
|
||
### Browser-use web-ui (`browser/`)
|
||
|
||
* purpose: “autonomous” research browser UI (chromium via playwright), reachable on `:7788`.
|
||
* built from upstream `browser-use/web-ui` repo, with python deps and browsers installed in the image.
|
||
|
||
### Cloudflared tunnel (`cloudflared-tunnel-config.yml`)
|
||
|
||
* maps hostnames (like `mlep.domain.com` for Ollama, `owebui.domain.com` for Open WebUI, and a `tools` host) to the internal services. Useful for private, authenticated access without public inbound ports.
|
||
|
||
---
|
||
|
||
## Why I currently **don’t** use an external RAG server
|
||
|
||
Open WebUI ships with pretty good **knowledge / RAG** support: add files/URLs, it chunks + embeds, indexes, retrieves, and automatically **prefixes retrieved context** to the model prompt using a RAG template. For lightweight to mid-sized corpora and single-user/small-team usage, that’s often all you need.
|
||
|
||
**Stay with built-in RAG if most of these are true:**
|
||
|
||
* total corpus is ≤ \~100k chunks and grows slowly.
|
||
* single user or small team (no multi-tenant isolation needed).
|
||
* no special retrieval logic (hybrid lexical+semantic, rerankers, metadata filters) beyond what Open WebUI provides.
|
||
* tolerance for “UI-managed” knowledge; you don’t need programmatic ingestion pipelines or job queues.
|
||
|
||
## When an external RAG server makes sense
|
||
|
||
Adopt a decoupled RAG service when you need one or more of:
|
||
|
||
* **bigger data / throughput**: millions of chunks, higher QPS, horizontal scaling.
|
||
* **advanced retrieval**: custom chunkers, hybrid search (bm25 + vector), **reranking**, time-decay, per-tenant filters, embeddings A/B, or multi-modal (image/audio) retrieval.
|
||
* **programmatic ingestion**: CI-driven pipelines from git/docs/confluence/S3; delta updates; background jobs.
|
||
* **governance / isolation**: strict multi-tenant separation, PII retention controls, audit trails.
|
||
* **interoperability**: a clean HTTP API and OpenAPI so other apps (beyond Open WebUI) can reuse your index.
|
||
|
||
---
|
||
|
||
## External RAG Server — Design and Reference Implementation
|
||
|
||
This is a small, dependency-light service designed to run with **Bun** and integrate with both **Ollama** and **Open WebUI**.
|
||
|
||
### Goals
|
||
|
||
* minimal moving parts; runs fine on a single host.
|
||
* uses Ollama for **embeddings** and **chat**.
|
||
* supports **collections**, **upserts**, **queries**, and an opinionated `/chat` that does retrieve-then-generate.
|
||
* ships an **OpenAPI** so Open WebUI can import it as a tool server.
|
||
* default in-memory store (persisted to JSON) for simplicity; optional adapters for vector DBs later.
|
||
|
||
### API surface
|
||
|
||
* `GET /openapi.json` – schema for tool integration.
|
||
* `POST /collections` – create a logical collection `{ name }`.
|
||
* `GET /collections` – list collections.
|
||
* `POST /upsert` – `{ collection, items:[{ id?, text, metadata? }] }`; chunks+embeds text and stores vectors.
|
||
* `POST /query` – `{ collection, query, topK?=5, where? }` --> nearest chunks with scores.
|
||
* `POST /chat` – `{ collection, query, topK?=5, model?, embedModel? }` --> runs RAG and calls Ollama chat, returns the answer + citations.
|
||
|
||
### Storage Strategy
|
||
|
||
* **default:** in-memory + JSON file on disk (`./data/rag.json`). good for dev/small usage.
|
||
* **plug-in adapters:** swap in Qdrant, SQLite-Vec, pgvector, Weaviate, etc., without changing the HTTP API.
|
||
|
||
---
|
||
|
||
### Add to `docker-compose.yml`
|
||
|
||
```yaml
|
||
rag:
|
||
build:
|
||
context: ./rag-server
|
||
dockerfile: Dockerfile
|
||
environment:
|
||
OLLAMA_BASE: "http://mlep.domain.com:11434"
|
||
OLLAMA_CHAT_MODEL: "llama3.1"
|
||
OLLAMA_EMBED_MODEL: "nomic-embed-text"
|
||
volumes:
|
||
- rag_data:/app/data
|
||
networks:
|
||
- internal
|
||
restart: unless-stopped
|
||
|
||
volumes:
|
||
rag_data:
|
||
```
|
||
|
||
> if you already expose services via cloudflared, add another hostname mapping to the `rag` container (`- hostname: rag.domain.com -> service: http://rag:8788`).
|
||
|
||
---
|
||
|
||
## Wiring the RAG server into Open WebUI and Ollama
|
||
|
||
### 1. Pull models
|
||
|
||
* `ollama pull nomic-embed-text` (embeddings)
|
||
* `ollama pull llama3.1` (chat)
|
||
|
||
### 2. Expose the OpenAPI to Open WebUI as a **tool server**
|
||
|
||
* in Open WebUI --> **settings --> tools** --> **add tool server**
|
||
* paste the url for the cloudflared hostname
|
||
* you’ll now see tool functions like `listCollections`, `createCollection`, `upsert`, `query`, `chat` available to the assistant
|
||
|
||
### 3. Usage pattern inside a chat
|
||
|
||
* to build a knowledge base, call the `createCollection` and `upsert` tools with your documents
|
||
* to answer, call `chat` which performs retrieve-then-generate against your chosen collection
|
||
|
||
---
|
||
|
||
## FAQ — Built-in vs. External RAG
|
||
|
||
**Q: will Open WebUI’s built-in RAG conflict with this server?**
|
||
no — you can use either, or both. Open WebUI’s knowledge base is great for ad-hoc use. this service is for programmatic/control-plane needs or when you outgrow the UI’s storage/retrieval.
|
||
|
||
**Q: how do enforce tenant isolation?**
|
||
use one collection per tenant and never mix. for stronger guarantees, run separate RAG instances or choose Qdrant with per-collection access control.
|
||
|
||
**Q: how can use my chunker/reranker?**
|
||
yes. place them ahead of `/upsert` and `/query` respectively, or add endpoints like `/rerank` and `/embed` to experiment.
|
||
|
||
**Q: can this call OpenAI-compatible endpoints instead of native Ollama?**
|
||
Ollama exposes an experimental OpenAI-compatible API. you can add a thin client if you already point tools at `/v1/chat/completions`.
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
This write-up and reference code are provided under the same **Apache-2.0** terms as the repository.
|