Tags: cloudflare/lol-html
Tags
Add bail-out handler API for flushing buffered state on graceful bail… …-out. Graceful bail-out (`MemoryLimitExceeded` or `ContentHandlerError` with the matching flag on) flushes the unparsed input remainder raw to the sink and propagates the error. That is enough for handlers that only transform tokens they see, but handlers that buffer state across the document (e.g. ROFL's email-obfuscation module, which holds up to ~128 chars in a text buffer while deciding whether they belong to an email) lose that state on bail-out and produce a response with a gap. This commit adds a hook that fires once on a graceful bail-out, immediately before the raw flush, and lets handlers append final bytes to the sink: 1. New rewritable unit `BailOut` with a single method `append(content, content_type)`, modelled after `DocumentEnd::append`. The wrapper carries the rewriter's current encoding (after any `<meta charset>`-driven change), so encoding-correctness is automatic. 2. New builder method `Settings::append_bail_out_handler` (and the `RewriteStrSettings` mirror) plus `bail_out!` macro for type-hint ergonomics, parallel to the existing `element!` / `end!` macros. 3. `HandlerTypes` grows a `BailOutHandler<'h>` associated type, with `LocalHandlerTypes` aliasing `BailOutHandler<'h>` and `SendHandlerTypes` aliasing `BailOutHandlerSend<'h>`. Matching `IntoHandler` impls cover both bare-closure cases. 4. `TransformController` grows a `handle_bail_out` method with an empty default impl so existing implementors (test fixtures, parser-trace tool) keep compiling. `HtmlRewriteController` overrides it to iterate the user-registered handlers in registration order. 5. `Dispatcher::run_bail_out_handlers` constructs the `BailOut` wrapper and delegates to the controller. It is invoked from every existing graceful bail-out site in `TransformStream::write()` (3 sites: `Arena::append`, `Parser::parse`, `Arena::init_with`) and `TransformStream::end()` (1 site), gated on `should_bail_out_for(&err)`. Hook output therefore lands in the sink as `[transformed prefix] + [hook output] + [raw remainder]`. 6. `RewritingError` is marked `#[non_exhaustive]` so we can add variants in future minor releases. `match`es still work; only exhaustive external matches need a catch-all arm. The `end()` bail-out site is defensive: it is symmetric with the `write()` sites but is not reachable through normal input. EOF-in-tag / -attribute / -comment emits as text per HTML5, so content-handler errors don't fire from `parser.parse(_, true)`, and memory errors fire earlier in `write()`. Tested implicitly via the shared call path.
PreviousNext