Skip to content

Omnichunk

Omnichunk splits source files and documents into deterministic, structure-aware chunks for RAG, embeddings, and LLM context windows.

Features

  • Code, prose, and markup routing with tree-sitter where available
  • Lossless byte and line ranges for reconstruction
  • Optional multiformat loaders (.ipynb, LaTeX, PDF/DOCX with extras)
  • Vector export helpers (Pinecone / Weaviate / Supabase shapes) without vendor SDKs
  • ChunkStore for SQLite-backed incremental indexing
  • omnichunk serve --mcp for JSON-RPC tool access (stdlib HTTP server)

Project