bytesense v0.1.2¶
Release date: 2025-03-26
Patch release focused on correctness, streaming behaviour, API typing, multi-encoding performance, and stricter test coverage targets.
Highlights¶
- UTF-8 BOM is reported as
utf_8_sig(Python codec for “UTF-8 with BOM”), matchingCandidateSelector.bom_encoding()and benchmark ground truth — no longer collapsed toutf_8in the BOM fast path. StreamDetector: In-band encoding hints are always considered until one is found; the 4KB scan cap lives inside the probe, so a large first chunk no longer skips meta/XML scanning. Regex patterns are shared withhints.py;codecsis loaded at module scope.detect_multi: Merging adjacent segments with the same encoding does not re-detect the full merged byte range;byte_countis updated on the existingDetectionResultviadataclasses.replace.repair_bytes: Keyword-onlymax_iterationsandchains, matchingrepair().DocumentSegment/MultiEncodingResult:slots=Trueon Python 3.10+ for consistency withDetectionResult.- Coverage CI:
fail_underraised from 50 to 75 (aligned with measured coverage onpytest tests/; 85% is a future goal).
Full change list¶
See CHANGELOG.md for this version.