Claude had a field day torching Cursor agent

April 12, 2026

3570 words18 min read

Note that this is FULL CLAUDE written. Don't read this if you don't want to laugh at one LLM critiquing the fuck out of another.

ass-renderer / ass-core Audit

Audit conducted 2026-04-09 during Phase 7 of the subtitle system implementation.

Actively Broken (HIGH)

GPU backends are non-functional stubs

WebGPU (backends/web/pipeline.rs): WebGpuPipeline returns empty results from all methods. create_pipeline() in web/mod.rs falls back to SoftwarePipeline. The composite_layers method uses pollster::block_on which won't work in WASM.
Metal (backends/hardware/metal.rs): render_layers() has stub comments ("In production, would create vertex buffer"). Falls back to SoftwarePipeline.
Vulkan (backends/hardware/vulkan.rs): Shader loading is stubbed. No actual Vulkan draw calls. Falls back to SoftwarePipeline.

All three silently fall back to software rendering while reporting BackendType::WebGPU/Metal/Vulkan.

Plan: WebGPU blit path (Phase 7, Task 4) will replace the WebGPU stubs with a software-render-then-GPU-blit approach. Metal and Vulkan will follow the same pattern when needed (see roadmap). The existing stub code that tries to do GPU-side text rendering should be removed.

Streaming parser is fake

parser/streaming.rs: StreamingResult::sections is Vec<String>, not parsed AST nodes. Comments say "simplified."
parser/streaming/processor.rs: process_section_content() returns empty DeltaBatch::new(). Comments say "In full parser this would..." and "In production..."

Plan: Not used by vodplace. Can be removed or properly implemented if incremental parsing becomes needed.

Fixed in This Session

Duplicate tag parsers (FIXED)

text_segmenter.rs had its own ~220-line match block duplicating tag_processor.rs, with a _ => {} catch-all silently dropping \blur, \be, \bord, \shad, \clip, \iclip, \q, \r, \org, \fax, \fay, and more.
Fix: Replaced with single tag_processor::apply_tag() call. 41 new tests added.

Duplicate colour parsers (FIXED)

SoftwarePipeline::parse_ass_color and ass_core::utils::parse_bgr_color both parsed ASS colours with different alpha handling.
Fix: parse_ass_color now delegates to parse_bgr_color.

Alpha inversion bug (FIXED)

ass_core::utils::parse_bgr_color treated ASS alpha 00 as transparent (RGBA 0) instead of opaque (RGBA 255). 6-char colours defaulted to alpha 0 (invisible).
Fix: Corrected alpha inversion. All tests updated.

Triple fontdb (FIXED)

Three separate fontdb::Database instances in RenderContext, SoftwarePipeline, SoftwareBackend. set_font_database() cloned into all three. composite_layers() created a throwaway backend per frame.
Fix: Single Arc<fontdb::Database> shared everywhere. Box<dyn RenderBackend> instead of Arc. Persistent backend with pixmap clear.

Font fallback hard failure (FIXED)

Renderer errored out with "Font not found and no fallback available" when exact font name wasn't in fontdb, even with fonts loaded.
Fix: Register generic families, absolute last-resort returns any loaded face.

Blur rendering order (FIXED)

\blur only blurred the text fill, not shadow/outline. Entire event should be blurred.
Fix: When blur present, all rendering (shadow + outline + fill + underline + strikethrough) goes to temp pixmap, gets blurred, then composited.

Fixed 2026-04-10

Rotation used radians instead of degrees (FIXED)

backends/software.rs: pre_rotate(angle_rad) — tiny_skia's pre_rotate takes degrees, not radians. All \frz rotations were ~57x too small (e.g., 13.73° rendered as 0.24°).
Fix: Pass degrees directly to pre_rotate.

Drawing paths not scaled (FIXED)

\p1 drawing commands were only translated to position, never scaled from PlayRes to canvas coordinates. No fscx/fscy or draw_level scaling applied.
Fix: Apply draw_scale * scale_x/y * font_scale_x/y transform to path geometry.

Drawing clip regions not applied (FIXED)

\clip tags on \p1 events were parsed but never passed through VectorData to the backend. All drawings rendered unclipped — combined heart+bone paths showed both shapes instead of only the clipped portion.
Fix: Added clip field to VectorData, create tiny_skia::Mask in backend.

Letter spacing not scaled or included in width (FIXED)

\fsp values were applied as raw pixels without PlayRes→canvas scaling, AND not included in total_line_width / seg_advance calculations. Text with \fsp was too wide (unscaled spacing) and misaligned (width not accounting for spacing).
Fix: Scale spacing by scale_x, include in width calculations.

Per-channel alpha not animatable (FIXED)

\1a, \2a, \3a, \4a were not recognised as \t() animation targets. \t(455,500,\1a&H00&) silently dropped the \1a target, leaving primary alpha permanently at its initial value.
Fix: Added PrimaryAlpha, SecondaryAlpha, OutlineAlpha, ShadowAlpha variants to AnimatableTag.

FontDb wrapper (FIXED)

Raw fontdb::Database didn't support fonts fontdb rejects (e.g., Symbol encoding fonts like SSF4 ABUKET). No fallback mechanism.
Fix: New FontDb wrapper with RwLock<Database> + RwLock<AHashMap<String, Arc<Vec<u8>>>> fallback font map. Shared via Arc<FontDb>. Fallback shaping path bypasses fontdb for rejected fonts.

MKV subtitle extraction missing timestamps (FIXED)

ass-tool extract-mkv prepended Dialogue: to raw ffmpeg packet data, which doesn't include Start/End timestamps (those are in PTS/duration metadata). All extracted ASS files had broken timing — no subtitles displayed.
Fix: Reconstruct H:MM:SS.CC timestamps from packet.pts() and packet.duration() using stream time_base.

Font weight suffix lookup (FIXED)

"Dosis Light" was not recognised as "Dosis" + Light weight. External font resolver redundantly tried to fetch fonts already loaded under their base family name.
Fix: Shared strip_weight_suffix() in ass_renderer::utils::font. Worker's is_family_loaded() and renderer's try_split_family_weight() both use it.

Performance (HIGH)

Software rendering is too slow for complex fansub typesetting

Measured: 651ms per frame on an "I ask of thee" scene with 12 complex \p1 glyph-outline drawings + blur + a full-screen black rectangle + film grain text overlay. Less than 2fps.
Root cause: Gaussian blur on large complex paths in software is O(width × height × kernel). Multiple blurred drawings per frame compounds the cost.
Comparison: WASM libass should handle this at interactive framerates — the performance gap suggests our rendering pipeline is doing significantly more work per frame than necessary, or doing it less efficiently.
Mitigation strategies (in order of expected impact):
1. Pre-render / lookahead cache: We have the full ASS file upfront plus a playback buffer of seconds to minutes. Pre-render upcoming frames on a background thread / during idle time. Cache rasterised drawing paths keyed by (path_hash, scale, blur_params).
2. Drawing path cache: Same \p1 path at same scale/blur appears across many frames. Cache the blurred rasterised result instead of re-rasterising every frame. Invalidate only when position/alpha/clip changes (those are cheap post-raster operations).
3. Blur optimisation: Current gaussian blur is naive scalar. Consider: box blur approximation (3-pass box blur ≈ gaussian), downscale-blur-upscale, or SIMD (the apply_gaussian_blur_simd function currently just calls scalar).
4. WebGPU blit path (Task #25): Move blur and compositing to GPU. Software rasterise glyph outlines, upload texture, GPU does blur + blend. Major architecture change but eliminates the CPU bottleneck entirely.
5. Render resolution scaling: Render subtitle overlay at reduced resolution (e.g., 50%) and upscale. Quality tradeoff but linear speedup.

Fixed 2026-04-11

Font scaling denominator wrong (FIXED)

Our renderer: Used font_size / units_per_em to convert font units to pixels.
libass: Overrides FreeType's face->ascender/face->descender with OS/2 usWinAscent/usWinDescent (GDI compatibility, see set_font_metrics() in ass_font.c:278-311), then FT_SIZE_REQUEST_TYPE_REAL_DIM divides by ascender - descender. For IwataMinchoProM (UPM=1000, usWin total=1310), this produced glyphs 32% too large in our renderer.
Fix: FontMetrics::from_face now mirrors libass's metric selection: usWin first, then hhea (or sTypo if USE_TYPO_METRICS), then sTypo fallback, then bbox. Both shaping and outline rendering use font_size / (ascender - descender) as the scale denominator.

`\move` timing double-conversion (FIXED)

parse_move_args converted milliseconds to centiseconds, then calculate_position_from_tags divided by 10 again. Movement completed 10× too fast — a 7.7-second animation finished in 0.77 seconds.
Fix: Removed the redundant /10 in the text path.

Text rotation center wrong (FIXED)

Rotated around (shaped.width/2, shaped.height/2) (text bounding box center). libass rotates around the \pos/\move anchor point.
Fix: Pass anchor point through TextData.anchor. Backend computes rotation center as (anchor_x - data.x, anchor_y - baseline_y) in glyph-local coords.

`\frz` on `\p1` drawings not applied (FIXED)

Drawing/vector path never added Rotation to the effects list, and draw_vector_layer didn't process rotation effects. \frz on drawings was silently ignored.
Fix: Added rotation to VectorData effects in the pipeline. Backend applies rotation around bounding box center before drawing.

Complex `\fade` broken (FIXED)

\fade(a1,a2,a3,t1,t2,t3,t4) (7-parameter form) was implemented as a linear interpolation from alpha_start to alpha_end using a single progress value, ignoring alpha_middle, time_fade_in, and time_fade_out. Text with complex fades was always invisible.
Fix: Proper 3-phase interpolation: fade-in (a1→a2 over t1→t2), hold (a2 over t2→t3), fade-out (a2→a3 over t3→t4).

Outline too thin (FIXED)

Used stroke.width = border_width * 0.6 — a fudge factor calibrated when text was the wrong size.
libass uses FT_Stroker which expands outward from the glyph outline by the full border width (one-sided). tiny_skia's stroke extends width/2 on each side.
Fix: Stroke at width * 2.0, then render to temp pixmap and punch out the glyph interior with DestinationOut blend mode, leaving only the outward ring. Clean one-sided border.

Shadow too small (FIXED)

Shadow was drawn as just the glyph fill paths offset by shadow distance. In libass, the shadow is the full bordered glyph shape (fill + outline) shifted.
Fix: Shadow now strokes the glyph paths with the border width before filling, producing the full bordered shadow shape.

Shadow offset halved (FIXED)

Shadow x/y offsets had an erroneous * 0.5 factor, making shadows appear half as far from the text as specified.
Fix: Removed the * 0.5.

`font-metrics` diagnostic command (NEW)

Added ass-tool font-metrics <dir> command that dumps UPM, hhea, OS/2 sTypo, usWin, USE_TYPO_METRICS flag, lineGap, and computed FreeType REAL_DIM denominator for all fonts in a directory.

Active Issues (as of 2026-04-11)

Rotation center Y offset for scaled text

For heavily scaled text (\fscx232\fscy520), the rotation center Y is slightly wrong. The anchor-based rotation is much closer than the old bbox-center approach, but there's a remaining offset related to how the unscaled baseline interacts with the scaled alignment offset in the coordinate chain.
Affected content: ioxho8.ass banner text ("The blood-smeared scene of the crime", "...of the culprit").
Status: Close but not pixel-perfect. Needs investigation of the coordinate chain: apply_alignment_offset uses scaled dimensions, base_transform starts at baseline_y (unscaled offset from data.y), and the Scale effect is applied after rotation.

`software_pipeline_new.rs` dead code

Unused file from the original vibecoded vendored version. Commented out of module tree. Should be deleted.

Film grain rendering

Fansub technique: render random text in a custom "Grain Medium" font at \alpha&HB6&\blur2 to create animated noise overlay. Events change every 4cs (25fps grain animation).
Status: Mostly working after font scaling fix. Grain fills the frame. Noise may be slightly too visible compared to reference.

Nisekoi subtitles not rendering

Reported after the font scaling changes. May be related to usWin metric selection for Nisekoi's fonts, or a regression in the fade/alpha handling.
Status: Not yet investigated.

Maintenance Debt (MEDIUM)

Hardcoded debug string "Чысценькая"

Files: backends/software.rs (2 locations), pipeline/software_pipeline.rs (5 locations), renderer/event_selector.rs
Debug conditionals like if data.text.contains("Чысценькая") scattered through production code.
Plan: Remove all instances. Use log crate (now added) for structured debug logging instead.

Dead code in SoftwarePipeline

OwnedStyle has unused fields: name, angle, border_style, encoding (all #[allow(dead_code)])
Unused methods at lines 370, 383, 1668 with #[allow(dead_code)]
glyph_renderer field marked dead_code with comment "used in future rendering features"
Plan: Remove dead fields and methods. If they're needed later they can be re-added from git history.

Glow effect plugin is empty

plugin/effects/mod.rs: apply_cpu() does nothing (let _ = ...). shader_code() returns None.
Plan: Remove or implement. Don't ship empty effect plugins.

Dirty region clipping doesn't clip

backends/software.rs:1099: let _ = region; // TODO: Apply clipping
Incremental rendering iterates dirty regions but renders the full scene for each.
Plan: Implement actual clip mask per dirty region, or remove the incremental path and always full-render (which is what happens anyway).

Pixel sampling debug output

backends/software.rs:1050-1072: Samples two pixels and prints alpha stats in debug builds.
Plan: Remove. Use log crate + proper frame analysis tools instead.

Low Priority / Cosmetic

Empty test modules

plugin/tags/formatting.rs:162: #[cfg(test)] mod tests {} — zero test coverage for formatting tag handlers.

SIMD blur placeholder

apply_gaussian_blur_simd just calls the scalar version. Comment says "real SIMD implementation would use intrinsics."

Simplified contrast calculation

debug/analyzer.rs:286: Uses mathematically incorrect formula. Comment says "simplified."

Memory measurement placeholder

debug/benchmarking.rs:347: measure_memory_usage() always returns None.

Multiple TODO fields in debug metrics

debug/mod.rs:242-248: Hardcoded zeros for active_events, cache_hits, etc.

Custom format line support missing

parser/ast/style.rs:145, parser/ast/event.rs:232: // TODO: Support custom format lines

Event selector backward compatibility function

renderer/event_selector.rs:259: #[allow(dead_code)] // Kept for backward compatibility

Outline (Border) Implementation

Stroking approach (UPDATED 2026-04-11)

libass: Uses FT_Stroker which expands outward from the glyph outline by the full border width (one-sided).
Our renderer: Uses tiny_skia::PathStroker::stroke() at width * 2.0 (so the outward half equals the border width), then punches out the glyph interior via a temp pixmap + DestinationOut blend. This produces a clean one-sided outward-only border.
Stroke order: The outline is stroked on the UNSCALED glyph path, then the Scale transform (fscx/fscy) is applied to the stroked result. This matches libass where the border is applied in pre-scale glyph space.
Shadow: Shadow is drawn as the full bordered glyph shape (stroke + fill) offset by shadow distance, matching libass.
Status: Visually matches libass for tested content. The DestinationOut approach is slightly more expensive than a true offset-curve stroker but produces clean results.

Blur Implementation Status

`\blur` (gaussian blur)

Current: Treats \blur value as sigma directly. Should convert radius to sigma per libass: sigma = 2 * radius / sqrt(ln(256)).
Algorithm: Separable gaussian with precomputed 1-D kernel at ±3σ. Correct.

`\be` (edge blur)

Current: 3×3 weighted kernel [[1,2,1],[2,4,2],[1,2,1]], n passes. Correct per ASS spec.

Blur scope

Current: Blur covers entire event (shadow + outline + fill) via temp pixmap. Correct.

Performance Analysis: libass vs our renderer (2026-04-10)

Benchmark

Our renderer: 651ms per frame on "I ask of thee" scene (12 complex \p1 glyph-outline drawings with \blur3.5 + full-screen black rectangle + film grain text overlay + regular dialogue). Less than 2fps.
libass: Same content at interactive framerates in pure software (no GPU).

Why libass is fast: six-level cache hierarchy

libass has six caches (ass_cache.c, created at renderer init):

Font cache — keyed by (family, bold, italic, vertical). Avoids re-loading fonts.
Outline cache — keyed by glyph ID+size (for text) or raw drawing text string (for \p1). The parsed ASS_Outline is cached. For our 12 \p1 drawings, the path is parsed ONCE EVER.
Bitmap cache — keyed by (outline pointer, quantized transform matrix, quantized offset). The rasterised bitmap is cached. If scale/rotation haven't changed (or changed within quantization tolerance), the rasterised result is reused.
Composite cache — keyed by (filter descriptor + array of BitmapRef pointers with positions). Stores post-blur, post-outline-subtraction, post-shadow results. This is the BIG one: the entire blur+composite pipeline result is cached. On frame 2+, if nothing changed, it's a single hash lookup.
Glyph metrics cache — avoids repeated FreeType queries.
Face size metrics cache — same.

Key insight: On frame 2+ for static content, EVERY cache level hits. Cost is essentially hash lookups. Even for animated content, only the levels that changed need recomputation.

Why libass's blur is fast: cascade blur, not naive gaussian

libass uses a cascade blur algorithm (ass_blur.c), not a naive separable gaussian:

Downscale the bitmap by 2x repeatedly (using [1,5,10,10,5,1] kernel) to reduce problem size
Apply a small filter (9-17 taps, radius 4-8) at reduced resolution
Upscale back

This makes blur cost O(n) regardless of radius — a \blur10 is roughly the same cost as \blur1 because the large radius just means more downscale steps, not more kernel taps.

All blur arithmetic uses int16_t (not f32), with 16.16 fixed-point coefficients. SIMD implementations exist for SSE2/SSSE3/AVX2 (x86/blur.asm) and NEON (aarch64/blur.S).

Why libass's rasteriser is fast: custom tiled rasteriser

libass does NOT use FreeType's rasteriser. It has its own tiled recursive rasteriser (ass_rasterizer.c) with three fast paths per tile:

No segments → solid fill/clear
One segment → half-plane fill
Multiple segments → generic trapezoid rasterisation

Coordinates use 26.6 fixed-point. Segment math uses int32_t/int64_t. All tile functions have SIMD implementations.

Why libass's memory is fast

Bitmaps are single-channel uint8_t (not RGBA) — 4x less memory, 4x less blur work
Stride aligned to SIMD width (16/32 bytes) for cache-line-friendly access
ass_aligned_alloc for all bitmap allocations

What libass does NOT do

libass does NOT composite onto video. It returns positioned alpha masks + colours as a linked list of ASS_Image. The video player composites. This means libass never allocates or fills a full-frame RGBA pixmap.
libass does NOT use the GPU for any rendering.

Our pipeline waste (specific issues)

Issue	Our approach	libass approach	Impact
Text shaping	Shape 2-3× per segment per frame (pipeline width calc + pipeline layout + backend render)	Shape once, cache metrics	HIGH — rustybuzz is expensive
Style parsing	Re-parse styles from script every frame (`prepare_script`)	Parse once at load time	MEDIUM
Drawing path rasterisation	Re-rasterise every frame (path is cached but bitmap is not)	Cache rasterised bitmap keyed by quantized transform	HIGH — biggest single win
Blur	Naive separable gaussian, f32 per-channel, O(w×h×kernel_radius×2)	Cascade blur, int16, O(n), SIMD	CRITICAL — dominates frame time
Blur pixmap allocation	`Pixmap::new()` per blur per frame (alloc + zero-fill)	Reuse buffers, single-channel uint8	HIGH — alloc overhead + 4× pixel data
Blur temp buffer	`vec![0u8; data.len()]` per blur (alloc + zero)	Pre-allocated stripe buffers	MEDIUM
Clip mask allocation	`Mask::new(width, height)` per drawing per frame	N/A (libass clips during rasterisation)	MEDIUM
Composite cache	None — re-render everything every frame	Composite cache keyed by bitmap refs + filter params	HIGH — avoids all blur/render work for static events
Pixel format	RGBA (4 bytes/pixel) for everything including temp pixmaps	Single-channel uint8 for rendering, colour applied at composite	HIGH — 4× memory, 4× blur work
Frame change detection	None — always re-render	`ass_detect_change` compares bitmap pointers, returns 0/1/2	HIGH for static subtitles
Full-frame pixmap	Allocate and fill full-frame RGBA pixmap every frame	Returns alpha masks only, caller composites	MEDIUM — could use OffscreenCanvas 2D directly

Recommended optimisation plan (priority order)

Bitmap cache for drawings + text — Cache rasterised bitmaps keyed by (path_hash, quantized_scale, quantized_rotation). For the "I ask of thee" scene, 12 drawings would be rasterised ONCE and cached. Position/alpha changes are cheap post-raster operations. This alone should reduce frame time from 651ms to ~50ms for frame 2+.
Composite cache — Cache post-blur results keyed by (bitmap_refs, blur_params). When the same set of glyphs appears with the same blur, skip the entire blur pipeline.
Cascade blur — Replace naive gaussian with downscale→blur→upscale. Makes blur O(n) regardless of radius. Even without SIMD, this is a massive win for \blur3.5+.
Single-channel rendering — Render to uint8 alpha bitmaps, apply colour at composite time. 4× less data to blur, 4× less memory.
Eliminate redundant shaping — Shape text once per segment per frame, share result between pipeline layout and backend render. Currently shaped 2-3×.
Buffer reuse — Pre-allocate blur temp buffers and clip masks, reuse across frames instead of allocating per-frame.
Frame change detection — Compare event list + parameters with previous frame. If unchanged, return previous frame data.
SIMD blur — The apply_gaussian_blur_simd stub currently just calls scalar. Implement actual SIMD for WASM (wasm_simd128) and native (SSE2/NEON).

Items 1-3 would likely bring frame time under 16ms for most content. Items 4-8 are incremental improvements.

Colour-independent rendering ("pseudo single-channel")

libass renders to single-channel uint8 alpha bitmaps because ASS subtitles use flat colours — a glyph's shape is independent of its colour. The alpha mask is the expensive part (rasterisation + blur), and colour is a cheap per-pixel multiply at composite time.

We can approximate this within tiny_skia's RGBA model by rendering everything as white with varying alpha (rgba(255, 255, 255, α)). This is the identity for premultiplied-alpha colour multiplication: at composite time, output = cached_pixel * (event_colour / 255) produces the correctly coloured result.

Why this matters for caching: without colour-independent rendering, the cache key must include the colour. With it, the cache key is just (shape_hash, scale, rotation, blur). This dramatically increases cache hit rate:

Multi-layer text effects (common fansub technique: same text rendered 2-3x with different colours for fill, outline glow, shadow) → one cached bitmap, 2-3 colour multiplications
Same \p1 drawing across 200 frames → one rasterisation + blur, 199 cache hits regardless of \t colour animations
Film grain characters in "Grain Medium" font → glyph-level cache. Each of ~95 printable ASCII shapes cached once, reused across all randomised grain text. The current approach re-rasterises every character every frame (15 lines × ~30 chars × 25fps = 11,250 glyph rasterisations/second → would become ~95 total, cached forever).

Even before moving to actual single-channel buffers, rendering white-with-alpha and applying colour at composite reduces cache cardinality by the number of distinct colours in the script — often 10-50x fewer cache entries needed.