rag-core

A Lightweight, Modular RAG Pipeline Library for Python

Overview

rag-core is an opinionated but modular RAG (Retrieval-Augmented Generation) pipeline that handles the full journey from raw documents to grounded LLM responses. It is designed to make smart defaults easy while keeping every component swappable.

Pipeline Architecture

Document Loader

Ingest text, markdown, PDF, or CSV files into a standard Document format

Chunker

Split documents into overlapping chunks using recursive, semantic, or fixed-size strategies

Embedding Engine

Convert chunks to vector embeddings via local models or the OpenAI API

Vector Store

Index embeddings for fast similarity search using in-memory numpy or ChromaDB

Retriever + Ranker

Find the top-k relevant chunks and re-rank them with metadata boosters

Prompt Builder

Assemble a grounded prompt with retrieved context and source citations

Design Decisions

Why Recursive Chunking Is the Default

Recursive chunking tries the largest natural boundaries first (double newlines, then single newlines, then sentences) before falling back to character splits. This preserves semantic coherence better than fixed-size chunking while still guaranteeing a maximum chunk size. In contrast, fixed-size chunks can split mid-sentence, degrading retrieval precision. Semantic chunking works well for structured content like markdown, but can produce unpredictable chunk sizes. Recursive chunking balances precision, context preservation, and compute cost.

Why Cosine Similarity Over Other Metrics

Cosine similarity measures the angle between two vectors, making it magnitude-invariant. This is critical for text embeddings because document length can vary dramatically. Two paragraphs about the same topic will have similar directions in embedding space even if one is twice as long. Dot product is faster but sensitive to magnitude. Euclidean distance works well for normalized vectors but can be misleading for raw embeddings of different lengths.

Local vs API Embeddings

The library supports both sentence-transformers (local, free, private) and OpenAI embeddings (higher quality, API cost). Local models like all-MiniLM-L6-v2 produce 384-dimensional embeddings with reasonable quality for most use cases. OpenAI's text-embedding-3-small produces 1536 dimensions with stronger semantic understanding. The choice depends on your privacy requirements, budget, and quality needs. For development and testing, local embeddings eliminate API dependencies entirely.

Caching Strategy

The embedding cache uses a content hash (text + model name) as the key, stored as a numpy .npz file on disk. This means re-ingesting unchanged documents skips the expensive embedding step entirely. For large document collections, this can reduce pipeline run time from minutes to seconds. The cache invalidates automatically when document content changes because the hash changes.

Code Examples

Basic Pipeline Usage

python

from rag_core import RAGPipeline
from rag_core.loaders import MarkdownLoader
from rag_core.embeddings import LocalEmbeddings

# Initialize pipeline with smart defaults
pipeline = RAGPipeline(
    embedding_provider=LocalEmbeddings(),
    chunk_strategy="recursive",
    chunk_size=512,
    chunk_overlap=50,
)

# Ingest documents
docs = MarkdownLoader.load_directory("./docs/")
pipeline.ingest(docs)

# Query
response = pipeline.query(
    "What are the key benefits of event-driven architecture?"
)
print(response.answer)
print(f"Sources: {response.sources}")
print(f"Confidence: {response.confidence_score}")

Chunking Strategy Comparison

python

# Fixed-size: simple but can split mid-sentence
fixed = FixedSizeChunker(chunk_size=500, overlap=50)

# Semantic: splits at paragraph/header boundaries
semantic = SemanticChunker(max_chunk_size=800)

# Recursive (default): tries natural boundaries first
recursive = RecursiveChunker(
    chunk_size=512,
    overlap=50,
    separators=["\n\n", "\n", ". ", " "]
)

Interactive Demo

Loading Documents

Chunking Documents

Generating Embeddings

Similarity Search

Retrieval & Ranking

Generating Response

Click "Run Query" to step through the RAG pipeline

Explore the Full Source Code

Clone the repo, run the tests, and try it with your own documents.

View on GitHub

rag-core

A Lightweight, Modular RAG Pipeline Library for Python

View on GitHub

Overview

Pipeline Architecture

Document Loader

Ingest text, markdown, PDF, or CSV files into a standard Document format

Chunker

Split documents into overlapping chunks using recursive, semantic, or fixed-size strategies

Embedding Engine

Convert chunks to vector embeddings via local models or the OpenAI API

Vector Store

Index embeddings for fast similarity search using in-memory numpy or ChromaDB

Retriever + Ranker

Find the top-k relevant chunks and re-rank them with metadata boosters

Prompt Builder

Assemble a grounded prompt with retrieved context and source citations

Design Decisions

Why Recursive Chunking Is the Default

Why Cosine Similarity Over Other Metrics

Local vs API Embeddings

Caching Strategy

Code Examples

Basic Pipeline Usage

python

from rag_core import RAGPipeline
from rag_core.loaders import MarkdownLoader
from rag_core.embeddings import LocalEmbeddings

# Initialize pipeline with smart defaults
pipeline = RAGPipeline(
    embedding_provider=LocalEmbeddings(),
    chunk_strategy="recursive",
    chunk_size=512,
    chunk_overlap=50,
)

# Ingest documents
docs = MarkdownLoader.load_directory("./docs/")
pipeline.ingest(docs)

# Query
response = pipeline.query(
    "What are the key benefits of event-driven architecture?"
)
print(response.answer)
print(f"Sources: {response.sources}")
print(f"Confidence: {response.confidence_score}")

Chunking Strategy Comparison

python

# Fixed-size: simple but can split mid-sentence
fixed = FixedSizeChunker(chunk_size=500, overlap=50)

# Semantic: splits at paragraph/header boundaries
semantic = SemanticChunker(max_chunk_size=800)

# Recursive (default): tries natural boundaries first
recursive = RecursiveChunker(
    chunk_size=512,
    overlap=50,
    separators=["\n\n", "\n", ". ", " "]
)

Interactive Demo

Loading Documents

Chunking Documents

Generating Embeddings

Similarity Search

Retrieval & Ranking

Generating Response

Click "Run Query" to step through the RAG pipeline

Explore the Full Source Code

Clone the repo, run the tests, and try it with your own documents.

View on GitHub