Files
2025-09-13 13:04:33 -04:00

189 lines
8.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ML Repo — Architecture and External RAG Server Design (for Ollama/Open WebUI)
My openWebUI/searxng configs, plugins, RAG server, as well as a custom program that runs the AI's code in isolated Docker containers
*Last updated: 2025-09-13*
> [!TIP]
> Looking for the compose version of this? See the [compose branch](https://git.ion606.com/ION606/ollama-plus/src/branch/compose/)
---
## Summary :3
This repository wires together a local AI stack built around **Open WebUI**, **Ollama**, **SearxNG**, and two custom utilities: a **code runner** (executes model-generated code inside sandboxed containers) and a **headless research browser UI**. The current compose setup already gives you working RAG (retrieval-augmented generation) **inside Open WebUI** without needing a separate RAG service.
---
## Repo map and how each piece fits
```sh
.
├─ docker-compose.yml
├─ searxng.yml # searxng settings; defaults, json+html enabled; not a public instance
├─ cloudflared-tunnel-config.yml # cloudflare tunnel routing to ollama, openwebui, and tools
├─ README.md
├─ LICENSE # apache-2.0
├─ rag-server/
│ ├─ Dockerfile # Runs the file that does the RAG stuff
│ └─ index.tsx # Does the RAG stuff
├─ browser/
│ └─ Dockerfile # builds browser-use/web-ui (playwright chromium) on :7788
|
└─ coderunner/
├─ Dockerfile # bun-based service that exposes an OpenAPI tool for sandboxed code exec
├─ index.ts # the server; integrates with Open WebUI as a tool via /openapi.json
└─ package.json # @types/node only (dev) to feed the OCD
```
### Open WebUI (in `docker-compose.yml`)
* purpose: chat UI + orchestration layer; **includes a built-in knowledge base + RAG** with chunking, embedding, search, and prompt templating.
* notable: backed by Postgres in this compose. exposes `4000:8080`.
* storage: a docker volume `open-webui:` holds app data; Postgres uses `pgdata:`.
### Postgres (in `docker-compose.yml`)
* purpose: persistence for Open WebUI features (users, knowledge, etc.). health-checked with `pg_isready`.
### SearxNG (in `docker-compose.yml` + `searxng.yml`)
* purpose: metasearch engine used by Open WebUI tools/agents for live web lookups.
* config highlights: `use_default_settings: true`, `public_instance: false`, `limiter: false`; formats: `html` and `json`.
### Coderunner service (`coderunner/`)
* **what it is:** a small HTTP server (Bun runtime) that executes pure source code in short-lived, sandboxed containers.
* **why it exists:** lets Open WebUI tools run code safely with tight resource limits (no network, read-only fs, cgroup limits, `--cap-drop=ALL`, `no-new-privileges`).
* **integration contract:** exposes an **OpenAPI schema at `/openapi.json`** and a single POST `/execute` endpoint. Open WebUI can import this as a **tool server**.
* **security posture:** pulls allow-listed base images (gcc, python, node, bun, etc.), mounts only a tmpfs workdir, times out jobs ≈25s, and runs with non-root uid/gid. The container has access to the hosts docker socket *only* to run the sandbox containers.
### Browser-use web-ui (`browser/`)
* purpose: “autonomous” research browser UI (chromium via playwright), reachable on `:7788`.
* built from upstream `browser-use/web-ui` repo, with python deps and browsers installed in the image.
### Cloudflared tunnel (`cloudflared-tunnel-config.yml`)
* maps hostnames (like `mlep.domain.com` for Ollama, `owebui.domain.com` for Open WebUI, and a `tools` host) to the internal services. Useful for private, authenticated access without public inbound ports.
---
## Why I currently **dont** use an external RAG server
Open WebUI ships with pretty good **knowledge / RAG** support: add files/URLs, it chunks + embeds, indexes, retrieves, and automatically **prefixes retrieved context** to the model prompt using a RAG template. For lightweight to mid-sized corpora and single-user/small-team usage, thats often all you need.
**Stay with built-in RAG if most of these are true:**
* total corpus is ≤ \~100k chunks and grows slowly.
* single user or small team (no multi-tenant isolation needed).
* no special retrieval logic (hybrid lexical+semantic, rerankers, metadata filters) beyond what Open WebUI provides.
* tolerance for “UI-managed” knowledge; you dont need programmatic ingestion pipelines or job queues.
## When an external RAG server makes sense
Adopt a decoupled RAG service when you need one or more of:
* **bigger data / throughput**: millions of chunks, higher QPS, horizontal scaling.
* **advanced retrieval**: custom chunkers, hybrid search (bm25 + vector), **reranking**, time-decay, per-tenant filters, embeddings A/B, or multi-modal (image/audio) retrieval.
* **programmatic ingestion**: CI-driven pipelines from git/docs/confluence/S3; delta updates; background jobs.
* **governance / isolation**: strict multi-tenant separation, PII retention controls, audit trails.
* **interoperability**: a clean HTTP API and OpenAPI so other apps (beyond Open WebUI) can reuse your index.
---
## External RAG Server — Design and Reference Implementation
This is a small, dependency-light service designed to run with **Bun** and integrate with both **Ollama** and **Open WebUI**.
### Goals
* minimal moving parts; runs fine on a single host.
* uses Ollama for **embeddings** and **chat**.
* supports **collections**, **upserts**, **queries**, and an opinionated `/chat` that does retrieve-then-generate.
* ships an **OpenAPI** so Open WebUI can import it as a tool server.
* default in-memory store (persisted to JSON) for simplicity; optional adapters for vector DBs later.
### API surface
* `GET /openapi.json` schema for tool integration.
* `POST /collections` create a logical collection `{ name }`.
* `GET /collections` list collections.
* `POST /upsert` `{ collection, items:[{ id?, text, metadata? }] }`; chunks+embeds text and stores vectors.
* `POST /query` `{ collection, query, topK?=5, where? }` --> nearest chunks with scores.
* `POST /chat` `{ collection, query, topK?=5, model?, embedModel? }` --> runs RAG and calls Ollama chat, returns the answer + citations.
### Storage Strategy
* **default:** in-memory + JSON file on disk (`./data/rag.json`). good for dev/small usage.
* **plug-in adapters:** swap in Qdrant, SQLite-Vec, pgvector, Weaviate, etc., without changing the HTTP API.
---
### Add to `docker-compose.yml`
```yaml
rag:
build:
context: ./rag-server
dockerfile: Dockerfile
environment:
OLLAMA_BASE: "http://mlep.domain.com:11434"
OLLAMA_CHAT_MODEL: "llama3.1"
OLLAMA_EMBED_MODEL: "nomic-embed-text"
volumes:
- rag_data:/app/data
networks:
- internal
restart: unless-stopped
volumes:
rag_data:
```
> if you already expose services via cloudflared, add another hostname mapping to the `rag` container (`- hostname: rag.domain.com -> service: http://rag:8788`).
---
## Wiring the RAG server into Open WebUI and Ollama
### 1. Pull models
* `ollama pull nomic-embed-text` (embeddings)
* `ollama pull llama3.1` (chat)
### 2. Expose the OpenAPI to Open WebUI as a **tool server**
* in Open WebUI --> **settings --> tools** --> **add tool server**
* paste the url for the cloudflared hostname
* youll now see tool functions like `listCollections`, `createCollection`, `upsert`, `query`, `chat` available to the assistant
### 3. Usage pattern inside a chat
* to build a knowledge base, call the `createCollection` and `upsert` tools with your documents
* to answer, call `chat` which performs retrieve-then-generate against your chosen collection
---
## FAQ — Built-in vs. External RAG
**Q: will Open WebUIs built-in RAG conflict with this server?**
no — you can use either, or both. Open WebUIs knowledge base is great for ad-hoc use. this service is for programmatic/control-plane needs or when you outgrow the UIs storage/retrieval.
**Q: how do enforce tenant isolation?**
use one collection per tenant and never mix. for stronger guarantees, run separate RAG instances or choose Qdrant with per-collection access control.
**Q: how can use my chunker/reranker?**
yes. place them ahead of `/upsert` and `/query` respectively, or add endpoints like `/rerank` and `/embed` to experiment.
**Q: can this call OpenAI-compatible endpoints instead of native Ollama?**
Ollama exposes an experimental OpenAI-compatible API. you can add a thin client if you already point tools at `/v1/chat/completions`.
---
## License
This write-up and reference code are provided under the same **Apache-2.0** terms as the repository.