Skip to content

Migrating to v0.8.x

New capabilities (non-breaking)

v0.8 adds optional workflows; existing chunk() / chunk_file() usage remains valid.

Hierarchical chunking

Use when you need multiple granularities (leaves for embeddings, roots for LLM context):

from omnichunk import Chunker

chunker = Chunker(max_chunk_size=256, size_unit="chars")
tree = chunker.hierarchical_chunk("api.py", source, levels=[64, 256, 1024])

Incremental diff

Use when syncing a vector database with file updates:

from omnichunk import Chunker

chunker = Chunker()
new_chunks = chunker.chunk("api.py", new_source)
diff = chunker.chunk_diff("api.py", new_source, previous_chunks=old_chunks)
# diff.added, diff.removed_ids, diff.unchanged

Stable IDs align with stable_chunk_id() and vector export row IDs from earlier releases.

Token budget selection

from omnichunk.budget import TokenBudgetOptimizer

opt = TokenBudgetOptimizer(budget=4096, strategy="greedy")
result = opt.select(retrieved_chunks, scores=scores)

Breaking changes

None required for basic chunking callers; new APIs are additive.