Skip to content

bytesense v0.1.2

Release date: 2025-03-26

Patch release focused on correctness, streaming behaviour, API typing, multi-encoding performance, and stricter test coverage targets.

Highlights

  • UTF-8 BOM is reported as utf_8_sig (Python codec for “UTF-8 with BOM”), matching CandidateSelector.bom_encoding() and benchmark ground truth — no longer collapsed to utf_8 in the BOM fast path.
  • StreamDetector: In-band encoding hints are always considered until one is found; the 4KB scan cap lives inside the probe, so a large first chunk no longer skips meta/XML scanning. Regex patterns are shared with hints.py; codecs is loaded at module scope.
  • detect_multi: Merging adjacent segments with the same encoding does not re-detect the full merged byte range; byte_count is updated on the existing DetectionResult via dataclasses.replace.
  • repair_bytes: Keyword-only max_iterations and chains, matching repair().
  • DocumentSegment / MultiEncodingResult: slots=True on Python 3.10+ for consistency with DetectionResult.
  • Coverage CI: fail_under raised from 50 to 75 (aligned with measured coverage on pytest tests/; 85% is a future goal).

Full change list

See CHANGELOG.md for this version.