semachunk: Minimal Semantic Chunker for RAG

When building the examples for Casai - my AI workflow framework - I wanted to include a proper RAG (Retrieval-Augmented Generation) example. Not a toy demo, but something that showed semantic chunking, vector search, and agentic filtering working together.

I hit a wall almost immediately: I couldn't find a TypeScript semantic chunker that fit my needs.

The Problem

Most semantic chunking libraries fall into one of three camps:

Heavy solutions that bundle their own embedding models, require downloads, spin up services, come with big frameworks and pull in massive dependencies
Not API/provider agnostic - tightly coupled to a specific embedding provider or API, instead of letting you plug in your own embedding API of choice
Fixed-size chunkers that just split on character count or tokens - missing the whole point of semantic chunking
No batch embedding - they merge chunks sequentially, firing off a new embedding request after every single merge. Fine for local models, but hostile to API rate limits.

I needed something different. For an examples repo, I wanted:

Zero infrastructure - no containers, no model downloads, no services
Model agnostic - let users plug in whatever embedding provider they're already using (OpenAI, Anthropic, Google, local models, whatever)
Batch-friendly - work with API rate limits, not against them
Simple API - a single function call, not a framework

The Solution: semachunk

I used semantic-chunking as a base - it has solid chunking logic - and built semachunk: extracting just the core algorithm, wrapping it in a callback-based API, and implementing a new merging algorithm that's both more efficient and should produce better results.

import { chunkText } from 'semachunk';

// Chunk the document semantically
const chunks = await chunkText(document, async (texts) => {
    // Your embedding logic - any provider, any model
    const response = await openai.embeddings.create({
        input: texts,
        model: "text-embedding-3-small"
    });
    return response.data.map(d => d.embedding);
}, {
    maxChunkSize: 500,
    similarityThreshold: 0.5,
    returnEmbedding: true
});

// Store in vector index
for (const chunk of chunks) {
    await index.upsertItem({
        vector: chunk.embedding,
        metadata: { text: chunk.text }
    });
}

That's it. You provide the text and an embedding function, you get back semantically coherent chunks ready for your vector store. No configuration files, no model downloads, no Docker containers.

A New Algorithm

The original semantic-chunking library didn't support batch embedding, and its merging algorithm wasn't structured in a way that made adding it straightforward - it merges chunks linearly, one pair at a time. So I wrote a new algorithm designed from the ground up to work in batches:

Score all adjacent chunk pairs by similarity
Select the best merge candidates (configurable percentage)
Merge them in one pass
Batch re-embed all affected chunks
Repeat until no good merges remain

This should produce better results (by considering all candidates globally rather than just the next pair) and drastically reduces API calls. Instead of potentially hundreds of single-embedding requests, you get a handful of batch requests.

The chunks preserve semantic coherence - a paragraph about a specific topic stays together rather than getting split mid-thought. This matters when you're retrieving context for an LLM; fragmented chunks lead to fragmented understanding.

Trade-offs

To be transparent: I haven't battle-tested this extensively. I built it for the Casai examples, and it works well for that use case. If you're processing millions of documents in production, you'll want to do your own evaluation.

The library is intentionally minimal. Just the core semantic chunking algorithm with a clean API.

Try It

Install directly:

npm install semachunk

Or see it in action with the RAG example in casai-examples:

git clone https://github.com/geleto/casai-examples.git
cd casai-examples
npm install
npm run example 14

GitHub: github.com/geleto/semachunk

If you're building RAG pipelines and want semantic chunking without the infrastructure overhead, give it a try. Feedback and contributions welcome.

semachunk is derived from semantic-chunking by jparkerweb. Check out the original if you need local model support.