Back to Projects
in-progress

RAG Configurator: Low-Code Platform for Custom RAG Pipelines

Design, deploy, and test custom RAG pipelines through a visual 9-step wizard. Four microservices, two Vue 3 frontends, 9 agent architectures, full local operation with Ollama.

RAGLangGraphFastAPIVue 3GoMongoDBDocker
Specifications
  • 9-step visual wizard for full RAG pipeline configuration — no code required
  • 9 agent architectures from Naive RAG to Graph RAG, all built on LangGraph
  • 5 chunking strategies and 5 retrieval methods with fine-tuned parameters
  • Full local operation with Ollama — no cloud API keys needed
  • Go gateway with JWT auth, rate limiting, and HMAC inter-service signing
  • 11-container Docker Compose stack with MongoDB vector store and Redis caching

Overview

RAG Configurator is a full-stack platform that lets you design, deploy, and test custom RAG pipelines entirely through a browser UI. Pick your LLM provider, choose your retrieval strategy, configure your agent architecture, ingest documents, and start chatting — all from a 9-step visual wizard.

No boilerplate. No infrastructure wrestling. Configure, ingest, query.

Dashboard showing all RAG configurations

The Problem

Every team building a RAG system faces the same decisions: which LLM provider, how to chunk documents, what retrieval strategy, which agent architecture. These choices are interdependent, hard to compare, and require writing significant boilerplate just to experiment. This platform removes that friction — make all choices visually, ingest your documents, test immediately, iterate.

The Configuration Wizard

A 9-step wizard walks through every decision in a RAG pipeline:

Select your data source — browse local folders or connect to S3

Choose your LLM and embedding model across OpenAI, Anthropic, Ollama, vLLM, and more

Configure chunking strategy and retrieval method with fine-tuned parameters

Select from 9 agent architectures — each with description and configurable parameters

The Sandbox

Once configured and documents ingested, switch to the Sandbox UI to chat with your knowledge base. Real-time SSE streaming, retrieved source chunks with relevance scores, and a full debug panel for inspecting the retrieval pipeline.

Sandbox chat interface with source chunks and debug panel

Architecture

Four microservices behind a Go reverse proxy gateway, two Vue 3 frontends, MongoDB + Redis infrastructure:

Client → Gateway (Go/Gin :8000) → Config Service (FastAPI :8001)
                                 → Ingestion Service (FastAPI+Celery :8002)
                                 → RAG Service (FastAPI+LangGraph :8003)

Go Gateway — JWT validation, rate limiting, CORS, HMAC inter-service signing, SSE streaming passthrough. Low memory footprint, fast cold starts.

Config Service — User auth with JWT + token blacklist, pipeline configuration CRUD, folder browsing, YAML/JSON export-import.

Ingestion Service — Factory-pattern document processors (PDF via Docling, DOCX, TXT, MD, HTML, images with OCR), chunkers, and embedders. Celery workers with live progress tracking.

RAG Service — LangGraph for stateful agent orchestration across 9 agent types, 5 retrieval strategies, 4 LLM providers. Includes RAGAS evaluation, guardrails, and query caching.

Agent Architectures

Agent How It Works Best For
Naive RAG Retrieve → Generate Quick Q&A, simple documents
ReAct Reasoning + Acting loop with tool use Complex queries needing iteration
CRAG Evaluates retrieval quality, self-corrects When retrieval accuracy matters
Self-RAG Generates, critiques, refines iteratively High-quality answers
Multi-Query Expands query into multiple perspectives Ambiguous or broad questions
Plan-Solve Decomposes into sub-tasks, solves step-by-step Multi-hop reasoning
Adaptive RAG Selects strategy based on query complexity Mixed workloads
Agentic RAG Autonomous agent with tool use Open-ended exploration
Graph RAG Knowledge graph-based retrieval Connected, relational data

Tech Stack

Layer Technologies
Gateway Go 1.24, Gin, golang-jwt
Backend Python 3.11, FastAPI, LangGraph, Celery, Motor
Frontend Vue 3, TypeScript, Vite, Tailwind CSS, Pinia
Database MongoDB 7.0 (document + vector store)
Cache Redis 7 (task queue + embedding/query cache)
AI/ML LangChain, OpenAI, Anthropic, Ollama, sentence-transformers
Infra Docker Compose (11 containers), Nginx
Observability Langfuse, OpenTelemetry

Running Locally

Full local operation with Ollama — no cloud API keys required:

ollama pull qwen3:4b
ollama pull nomic-embed-text
docker compose up -d

Tested with qwen3:4b on an NVIDIA RTX 4050. Query latency averages 2-5 seconds depending on agent architecture.

Project: rag-configurator