perf(css): struct-of-arrays AST for the streaming CSS parser#21285
perf(css): struct-of-arrays AST for the streaming CSS parser#21285alexander-akait wants to merge 17 commits into
Conversation
Profile-driven changes to the CSS tokenizer/parser hot path: - consumeAToken: replace the switch over sparse lead code points with a precomputed char-class dispatch table (one Uint8Array load + a dense jump table). Idents — the most frequent lead — dispatch directly instead of falling through the digit/whitespace cases. This was the single hottest function (~17% of parse time). - Collapse the five non-leaf node types (function, simple block, declaration, at-rule, qualified rule) into one Container class with a fixed field layout, so the walker's .type / .value loads see four node shapes instead of eight — back inside V8's polymorphic inline-cache limit (the profile showed ~5% in LoadIC_Megamorphic). - Drop the per-token _value cache slot: derive value from the byte range on read, re-derive hash id-ness from the source, and keep only url's content offsets. Leaf tokens are ~94% of nodes, so removing the slot is the bulk of the memory win. Net on a streaming parse+walk benchmark: ~5% faster and ~5% less retained AST memory. AST output is byte-identical (same nodes, ranges, and lazy getter results).
Add a spec-mapped reference (CSS Syntax Level 3 §4) above `_charClass` explaining why the per-character dispatch table is used and listing which code points map to each handler class, so future spec changes are easy to locate and apply.
`_consumeAnIdentSequence` runs per code point of every ident, at-keyword, hash, function name and unit — the hottest tokenizer function on a large real-world build (Tailwind v4). Inline the two per-char checks: the ASCII ident test becomes a direct table load (no `_isIdentCodePoint` call), and the terminating escape test reads the next code point only when the char is a backslash instead of eagerly via `_ifTwoCodePointsAreValidEscape`. ~1.7% faster on lex-only over the Tailwind build; token output unchanged.
Annotate the `consumeAToken` char-class switch so each case states which CSS Syntax §4 token(s) it yields (and why `-`/`+`/`.`/`#`/`@`/`<`/`\\` branch further), making the optimized dispatch self-documenting.
consumeAComponentValue is called for every component value (the bulk of nodes on a large stylesheet). Two redundancies removed: - It re-called ts.next() even though every caller had already peeked that token; thread the peeked token in (default ts.next() for the few callers that don't) so the common loops skip the extra call. - The leaf-token branch delegated to consumeATokenAsNode, which peeked the token a third time via ts.consume(); inline it (advance + tokenToNode on the token we already hold). ~2% faster on a Tailwind v4 build (CPU-isolated); token/AST output unchanged.
tokenToNode was a ~20-arm switch with a `new Token(...)` in each arm. V8 compiled those into generic construct stubs (visible in profiles), since the function had many allocation sites. Replace the switch with a lexer-type → node-type lookup table and one `new Token` call (URL, the only leaf with own state, handled separately). ~13% faster on a CPU-isolated parse, ~6% on a Tailwind v4 build; AST output unchanged.
Introduce an accessor object A in lib/css/syntax.js routing every AST-node field read through a function, so the node representation can be swapped from class objects to a struct-of-arrays without touching consumers. Additive and object-backed for now; no behavior change.
Rewrite every AST-node field read in CssParser to go through the A accessor object instead of touching node properties directly, and move the ad-hoc `urlRecovery` flag off the node into a parser-local. Behavior-preserving (654 css config cases pass); prepares the consumer for a struct-of-arrays node representation.
Build every parser node through module-level construction primitives (_mkLeaf / _mkContainer / _setName / ...) instead of calling new Token / new Container directly. An object backend reproduces the existing class-instance tree the parseA* entry points return. This is a behavior-preserving refactor (654 css config cases + 105 parser unit tests pass) that lets the streaming grammar swap in a struct-of-arrays backend without forking the consume algorithms.
The streaming grammar (the only path CssParser uses) now builds nodes into reused typed arrays instead of class instances: node fields live in parallel Uint8/Int32 arrays indexed by a 1-based node id, child lists hang off three object arrays, and the write cursor is reset after each top-level rule's walk so the buffers are reused and the parse allocates almost nothing across rules. The A accessor reads the arrays, so CssParser is unchanged. parseA* keep the object backend (retainable class tree the unit tests inspect). 654 css config cases + 105 css parser unit tests pass.
Remove the dead isIdentStartCodePoint (superseded by _isIdentStartCodePointCC) and the unused HC_DELIM constant (the delim class is the default 0), brace the char-class build loop, and order the parser unit-test imports — clearing the eslint debt on the touched files.
The typed-array buffers stay pooled across parses for reuse, but the per-container child-list arrays are dropped once the parse completes so they are not retained idle until the next parse overwrites them (matters for watch builds between rebuilds).
This reverts commit e3c112a.
The Ident and Declaration value-visitors sliced the node value / property name on every node even for plain (non-CSS-Modules) stylesheets, where the dashed-ident and ICSS paths that consume them are inert. Gate that work behind the dashed/ICSS context (Ident) and isModules (Declaration), so the common non-modules parse skips a slice per ident and a slice + property-name normalization per declaration. ~6.5% faster on a non-modules Tailwind parse; CSS Modules behavior and output unchanged (654 css config cases pass).
processDashedIdentInVarFunction fires for every var()/style() in CSS-Modules mode (Tailwind has thousands). It sliced the first ident's value only to test the `--` prefix; check the two leading bytes directly instead. CSS Modules output unchanged (654 css config cases pass).
🦋 Changeset detectedLatest commit: 5991398 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
This PR is packaged and the instant preview is available (5991398). Install it locally:
npm i -D webpack@https://pkg.pr.new/webpack@5991398
yarn add -D webpack@https://pkg.pr.new/webpack@5991398
pnpm add -D webpack@https://pkg.pr.new/webpack@5991398 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #21285 +/- ##
==========================================
+ Coverage 92.82% 92.87% +0.04%
==========================================
Files 592 593 +1
Lines 64829 65163 +334
Branches 18067 18119 +52
==========================================
+ Hits 60175 60517 +342
+ Misses 4654 4646 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Merging this PR will improve performance by 52.68%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Memory | benchmark "css-modules", scenario '{"name":"mode-production","mode":"production"}' |
7.1 MB | 9.3 MB | -23.45% |
| ❌ | Memory | benchmark "many-chunks-commonjs", scenario '{"name":"mode-production","mode":"production"}' |
6.9 MB | 8.9 MB | -23.28% |
| ⚡ | Memory | benchmark "side-effects-reexport", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' |
858.9 KB | 127.7 KB | ×6.7 |
| ⚡ | Memory | benchmark "wasm-modules-async", scenario '{"name":"mode-development-rebuild","mode":"development","watch":true}' |
332.3 KB | 189.9 KB | +74.92% |
| ⚡ | Memory | benchmark "future-defaults", scenario '{"name":"mode-production","mode":"production"}' |
8.8 MB | 7.4 MB | +20.1% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/css-struct-of-arrays-parser (5991398) with main (5e7db5d)
Types CoverageCoverage after merging perf/css-struct-of-arrays-parser into main will be
Coverage Report |
Summary
The CSS parser allocated a class instance per AST node on the hot parse path, so large stylesheets are GC-bound. This reworks the streaming parser to a struct-of-arrays representation — node fields live in reused typed arrays with no per-node objects — behind an accessor seam, so
CssParseris unchanged; it also trims redundant per-node value/name slices outside CSS Modules. On compiled Tailwind (3.3 MiB, ~478k nodes) a non-Modules parse is ~26% faster using ~47% less memory, and CSS Modules ~11% faster / ~39% less.What kind of change does this PR introduce?
perf
Did you add tests for your changes?
No new cases — the change is behavior-preserving and covered by the existing suites (654
configCases/css, plus the parser unit tests); I updatedtest/walkCssTokensParser.unittest.jsto read nodes through the accessor seam.Does this PR introduce a breaking change?
No.
If relevant, what needs to be documented once your changes are merged or what have you already documented?
n/a — internal parser change, no public API or config surface.
Use of AI
Yes. Developed with Claude Code: it designed the struct-of-arrays representation, ported the
CssParserconsumer onto the accessor seam, profiled and benchmarked each change, and gated every commit on the CSS test suites. All output was reviewed and validated before submission.Generated by Claude Code