Claude had a field day torching Cursor agent
Note that this is FULL CLAUDE written. Don't read this if you don't want to laugh at one LLM critiquing the fuck out of another.
ass-renderer / ass-core Audit
Audit conducted 2026-04-09 during Phase 7 of the subtitle system implementation.
Actively Broken (HIGH)
GPU backends are non-functional stubs
- WebGPU (
backends/web/pipeline.rs):WebGpuPipelinereturns empty results from all methods.create_pipeline()inweb/mod.rsfalls back toSoftwarePipeline. The composite_layers method usespollster::block_onwhich won't work in WASM. - Metal (
backends/hardware/metal.rs):render_layers()has stub comments ("In production, would create vertex buffer"). Falls back toSoftwarePipeline. - Vulkan (
backends/hardware/vulkan.rs): Shader loading is stubbed. No actual Vulkan draw calls. Falls back toSoftwarePipeline.
All three silently fall back to software rendering while reporting BackendType::WebGPU/Metal/Vulkan.
Plan:
WebGPU blit path (Phase 7, Task 4) will replace the WebGPU stubs with a software-render-then-GPU-blit approach. Metal and Vulkan will follow the same pattern when needed (see roadmap). The existing stub code that tries to do GPU-side text rendering should be removed.Streaming parser is fake
parser/streaming.rs:StreamingResult::sectionsisVec<String>, not parsed AST nodes. Comments say "simplified."parser/streaming/processor.rs:process_section_content()returns emptyDeltaBatch::new(). Comments say "In full parser this would..." and "In production..."
Plan:
Not used by vodplace. Can be removed or properly implemented if incremental parsing becomes needed.Fixed in This Session
Duplicate tag parsers (FIXED)
text_segmenter.rshad its own ~220-line match block duplicatingtag_processor.rs, with a_ => {}catch-all silently dropping\blur,\be,\bord,\shad,\clip,\iclip,\q,\r,\org,\fax,\fay, and more.- Fix: Replaced with single
tag_processor::apply_tag()call. 41 new tests added.
Duplicate colour parsers (FIXED)
SoftwarePipeline::parse_ass_colorandass_core::utils::parse_bgr_colorboth parsed ASS colours with different alpha handling.- Fix:
parse_ass_colornow delegates toparse_bgr_color.
Alpha inversion bug (FIXED)
ass_core::utils::parse_bgr_colortreated ASS alpha 00 as transparent (RGBA 0) instead of opaque (RGBA 255). 6-char colours defaulted to alpha 0 (invisible).- Fix: Corrected alpha inversion. All tests updated.
Triple fontdb (FIXED)
- Three separate
fontdb::Databaseinstances in RenderContext, SoftwarePipeline, SoftwareBackend.set_font_database()cloned into all three.composite_layers()created a throwaway backend per frame. - Fix: Single
Arc<fontdb::Database>shared everywhere.Box<dyn RenderBackend>instead of Arc. Persistent backend with pixmap clear.
Font fallback hard failure (FIXED)
- Renderer errored out with "Font not found and no fallback available" when exact font name wasn't in fontdb, even with fonts loaded.
- Fix: Register generic families, absolute last-resort returns any loaded face.
Blur rendering order (FIXED)
\bluronly blurred the text fill, not shadow/outline. Entire event should be blurred.- Fix: When blur present, all rendering (shadow + outline + fill + underline + strikethrough) goes to temp pixmap, gets blurred, then composited.
Fixed 2026-04-10
Rotation used radians instead of degrees (FIXED)
backends/software.rs:pre_rotate(angle_rad)— tiny_skia'spre_rotatetakes degrees, not radians. All\frzrotations were ~57x too small (e.g., 13.73° rendered as 0.24°).- Fix: Pass degrees directly to
pre_rotate.
Drawing paths not scaled (FIXED)
\p1drawing commands were only translated to position, never scaled from PlayRes to canvas coordinates. No fscx/fscy or draw_level scaling applied.- Fix: Apply
draw_scale * scale_x/y * font_scale_x/ytransform to path geometry.
Drawing clip regions not applied (FIXED)
\cliptags on\p1events were parsed but never passed throughVectorDatato the backend. All drawings rendered unclipped — combined heart+bone paths showed both shapes instead of only the clipped portion.- Fix: Added
clipfield toVectorData, createtiny_skia::Maskin backend.
Letter spacing not scaled or included in width (FIXED)
\fspvalues were applied as raw pixels without PlayRes→canvas scaling, AND not included intotal_line_width/seg_advancecalculations. Text with\fspwas too wide (unscaled spacing) and misaligned (width not accounting for spacing).- Fix: Scale spacing by
scale_x, include in width calculations.
Per-channel alpha not animatable (FIXED)
\1a,\2a,\3a,\4awere not recognised as\t()animation targets.\t(455,500,\1a&H00&)silently dropped the\1atarget, leaving primary alpha permanently at its initial value.- Fix: Added
PrimaryAlpha,SecondaryAlpha,OutlineAlpha,ShadowAlphavariants toAnimatableTag.
FontDb wrapper (FIXED)
- Raw
fontdb::Databasedidn't support fonts fontdb rejects (e.g., Symbol encoding fonts like SSF4 ABUKET). No fallback mechanism. - Fix: New
FontDbwrapper withRwLock<Database>+RwLock<AHashMap<String, Arc<Vec<u8>>>>fallback font map. Shared viaArc<FontDb>. Fallback shaping path bypasses fontdb for rejected fonts.
MKV subtitle extraction missing timestamps (FIXED)
ass-tool extract-mkvprependedDialogue:to raw ffmpeg packet data, which doesn't include Start/End timestamps (those are in PTS/duration metadata). All extracted ASS files had broken timing — no subtitles displayed.- Fix: Reconstruct
H:MM:SS.CCtimestamps frompacket.pts()andpacket.duration()using stream time_base.
Font weight suffix lookup (FIXED)
- "Dosis Light" was not recognised as "Dosis" + Light weight. External font resolver redundantly tried to fetch fonts already loaded under their base family name.
- Fix: Shared
strip_weight_suffix()inass_renderer::utils::font. Worker'sis_family_loaded()and renderer'stry_split_family_weight()both use it.
Performance (HIGH)
Software rendering is too slow for complex fansub typesetting
- Measured: 651ms per frame on an "I ask of thee" scene with 12 complex
\p1glyph-outline drawings + blur + a full-screen black rectangle + film grain text overlay. Less than 2fps. - Root cause: Gaussian blur on large complex paths in software is O(width × height × kernel). Multiple blurred drawings per frame compounds the cost.
- Comparison: WASM libass should handle this at interactive framerates — the performance gap suggests our rendering pipeline is doing significantly more work per frame than necessary, or doing it less efficiently.
- Mitigation strategies (in order of expected impact):
- Pre-render / lookahead cache: We have the full ASS file upfront plus a playback buffer of seconds to minutes. Pre-render upcoming frames on a background thread / during idle time. Cache rasterised drawing paths keyed by (path_hash, scale, blur_params).
- Drawing path cache: Same
\p1path at same scale/blur appears across many frames. Cache the blurred rasterised result instead of re-rasterising every frame. Invalidate only when position/alpha/clip changes (those are cheap post-raster operations). - Blur optimisation: Current gaussian blur is naive scalar. Consider: box blur approximation (3-pass box blur ≈ gaussian), downscale-blur-upscale, or SIMD (the
apply_gaussian_blur_simdfunction currently just calls scalar). - WebGPU blit path (Task #25): Move blur and compositing to GPU. Software rasterise glyph outlines, upload texture, GPU does blur + blend. Major architecture change but eliminates the CPU bottleneck entirely.
- Render resolution scaling: Render subtitle overlay at reduced resolution (e.g., 50%) and upscale. Quality tradeoff but linear speedup.
Fixed 2026-04-11
Font scaling denominator wrong (FIXED)
- Our renderer: Used
font_size / units_per_emto convert font units to pixels. - libass: Overrides FreeType's
face->ascender/face->descenderwith OS/2 usWinAscent/usWinDescent (GDI compatibility, seeset_font_metrics()inass_font.c:278-311), then FT_SIZE_REQUEST_TYPE_REAL_DIM divides byascender - descender. For IwataMinchoProM (UPM=1000, usWin total=1310), this produced glyphs 32% too large in our renderer. - Fix:
FontMetrics::from_facenow mirrors libass's metric selection: usWin first, then hhea (or sTypo if USE_TYPO_METRICS), then sTypo fallback, then bbox. Both shaping and outline rendering usefont_size / (ascender - descender)as the scale denominator.
\move timing double-conversion (FIXED)
parse_move_argsconverted milliseconds to centiseconds, thencalculate_position_from_tagsdivided by 10 again. Movement completed 10× too fast — a 7.7-second animation finished in 0.77 seconds.- Fix: Removed the redundant
/10in the text path.
Text rotation center wrong (FIXED)
- Rotated around
(shaped.width/2, shaped.height/2)(text bounding box center). libass rotates around the\pos/\moveanchor point. - Fix: Pass anchor point through
TextData.anchor. Backend computes rotation center as(anchor_x - data.x, anchor_y - baseline_y)in glyph-local coords.
\frz on \p1 drawings not applied (FIXED)
- Drawing/vector path never added Rotation to the effects list, and
draw_vector_layerdidn't process rotation effects.\frzon drawings was silently ignored. - Fix: Added rotation to VectorData effects in the pipeline. Backend applies rotation around bounding box center before drawing.
Complex \fade broken (FIXED)
\fade(a1,a2,a3,t1,t2,t3,t4)(7-parameter form) was implemented as a linear interpolation fromalpha_starttoalpha_endusing a single progress value, ignoringalpha_middle,time_fade_in, andtime_fade_out. Text with complex fades was always invisible.- Fix: Proper 3-phase interpolation: fade-in (a1→a2 over t1→t2), hold (a2 over t2→t3), fade-out (a2→a3 over t3→t4).
Outline too thin (FIXED)
- Used
stroke.width = border_width * 0.6— a fudge factor calibrated when text was the wrong size. - libass uses FT_Stroker which expands outward from the glyph outline by the full border width (one-sided). tiny_skia's stroke extends
width/2on each side. - Fix: Stroke at
width * 2.0, then render to temp pixmap and punch out the glyph interior withDestinationOutblend mode, leaving only the outward ring. Clean one-sided border.
Shadow too small (FIXED)
- Shadow was drawn as just the glyph fill paths offset by shadow distance. In libass, the shadow is the full bordered glyph shape (fill + outline) shifted.
- Fix: Shadow now strokes the glyph paths with the border width before filling, producing the full bordered shadow shape.
Shadow offset halved (FIXED)
- Shadow x/y offsets had an erroneous
* 0.5factor, making shadows appear half as far from the text as specified. - Fix: Removed the
* 0.5.
font-metrics diagnostic command (NEW)
- Added
ass-tool font-metrics <dir>command that dumps UPM, hhea, OS/2 sTypo, usWin, USE_TYPO_METRICS flag, lineGap, and computed FreeType REAL_DIM denominator for all fonts in a directory.
Active Issues (as of 2026-04-11)
Rotation center Y offset for scaled text
- For heavily scaled text (
\fscx232\fscy520), the rotation center Y is slightly wrong. The anchor-based rotation is much closer than the old bbox-center approach, but there's a remaining offset related to how the unscaled baseline interacts with the scaled alignment offset in the coordinate chain. - Affected content: ioxho8.ass banner text ("The blood-smeared scene of the crime", "...of the culprit").
- Status: Close but not pixel-perfect. Needs investigation of the coordinate chain:
apply_alignment_offsetuses scaled dimensions,base_transformstarts at baseline_y (unscaled offset from data.y), and the Scale effect is applied after rotation.
software_pipeline_new.rs dead code
- Unused file from the original vibecoded vendored version. Commented out of module tree. Should be deleted.
Film grain rendering
- Fansub technique: render random text in a custom "Grain Medium" font at
\alpha&HB6&\blur2to create animated noise overlay. Events change every 4cs (25fps grain animation). - Status: Mostly working after font scaling fix. Grain fills the frame. Noise may be slightly too visible compared to reference.
Nisekoi subtitles not rendering
- Reported after the font scaling changes. May be related to usWin metric selection for Nisekoi's fonts, or a regression in the fade/alpha handling.
- Status: Not yet investigated.
Maintenance Debt (MEDIUM)
Hardcoded debug string "Чысценькая"
- Files:
backends/software.rs(2 locations),pipeline/software_pipeline.rs(5 locations),renderer/event_selector.rs - Debug conditionals like
if data.text.contains("Чысценькая")scattered through production code. - Plan: Remove all instances. Use
logcrate (now added) for structured debug logging instead.
Dead code in SoftwarePipeline
OwnedStylehas unused fields:name,angle,border_style,encoding(all#[allow(dead_code)])- Unused methods at lines 370, 383, 1668 with
#[allow(dead_code)] glyph_rendererfield marked dead_code with comment "used in future rendering features"- Plan: Remove dead fields and methods. If they're needed later they can be re-added from git history.
Glow effect plugin is empty
plugin/effects/mod.rs:apply_cpu()does nothing (let _ = ...).shader_code()returnsNone.- Plan: Remove or implement. Don't ship empty effect plugins.
Dirty region clipping doesn't clip
backends/software.rs:1099:let _ = region; // TODO: Apply clipping- Incremental rendering iterates dirty regions but renders the full scene for each.
- Plan: Implement actual clip mask per dirty region, or remove the incremental path and always full-render (which is what happens anyway).
Pixel sampling debug output
backends/software.rs:1050-1072: Samples two pixels and prints alpha stats in debug builds.- Plan: Remove. Use log crate + proper frame analysis tools instead.
Low Priority / Cosmetic
Empty test modules
plugin/tags/formatting.rs:162:#[cfg(test)] mod tests {}— zero test coverage for formatting tag handlers.
SIMD blur placeholder
apply_gaussian_blur_simdjust calls the scalar version. Comment says "real SIMD implementation would use intrinsics."
Simplified contrast calculation
debug/analyzer.rs:286: Uses mathematically incorrect formula. Comment says "simplified."
Memory measurement placeholder
debug/benchmarking.rs:347:measure_memory_usage()always returnsNone.
Multiple TODO fields in debug metrics
debug/mod.rs:242-248: Hardcoded zeros for active_events, cache_hits, etc.
Custom format line support missing
parser/ast/style.rs:145,parser/ast/event.rs:232:// TODO: Support custom format lines
Event selector backward compatibility function
renderer/event_selector.rs:259:#[allow(dead_code)] // Kept for backward compatibility
Outline (Border) Implementation
Stroking approach (UPDATED 2026-04-11)
- libass: Uses FT_Stroker which expands outward from the glyph outline by the full border width (one-sided).
- Our renderer: Uses
tiny_skia::PathStroker::stroke()atwidth * 2.0(so the outward half equals the border width), then punches out the glyph interior via a temp pixmap +DestinationOutblend. This produces a clean one-sided outward-only border. - Stroke order: The outline is stroked on the UNSCALED glyph path, then the Scale transform (fscx/fscy) is applied to the stroked result. This matches libass where the border is applied in pre-scale glyph space.
- Shadow: Shadow is drawn as the full bordered glyph shape (stroke + fill) offset by shadow distance, matching libass.
- Status: Visually matches libass for tested content. The DestinationOut approach is slightly more expensive than a true offset-curve stroker but produces clean results.
Blur Implementation Status
\blur (gaussian blur)
- Current: Treats
\blurvalue as sigma directly. Should convert radius to sigma per libass:sigma = 2 * radius / sqrt(ln(256)). - Algorithm: Separable gaussian with precomputed 1-D kernel at ±3σ. Correct.
\be (edge blur)
- Current: 3×3 weighted kernel
[[1,2,1],[2,4,2],[1,2,1]], n passes. Correct per ASS spec.
Blur scope
- Current: Blur covers entire event (shadow + outline + fill) via temp pixmap. Correct.
Performance Analysis: libass vs our renderer (2026-04-10)
Benchmark
- Our renderer: 651ms per frame on "I ask of thee" scene (12 complex
\p1glyph-outline drawings with\blur3.5+ full-screen black rectangle + film grain text overlay + regular dialogue). Less than 2fps. - libass: Same content at interactive framerates in pure software (no GPU).
Why libass is fast: six-level cache hierarchy
libass has six caches (ass_cache.c, created at renderer init):
- Font cache — keyed by
(family, bold, italic, vertical). Avoids re-loading fonts. - Outline cache — keyed by glyph ID+size (for text) or raw drawing text string (for
\p1). The parsedASS_Outlineis cached. For our 12\p1drawings, the path is parsed ONCE EVER. - Bitmap cache — keyed by
(outline pointer, quantized transform matrix, quantized offset). The rasterised bitmap is cached. If scale/rotation haven't changed (or changed within quantization tolerance), the rasterised result is reused. - Composite cache — keyed by
(filter descriptor + array of BitmapRef pointers with positions). Stores post-blur, post-outline-subtraction, post-shadow results. This is the BIG one: the entire blur+composite pipeline result is cached. On frame 2+, if nothing changed, it's a single hash lookup. - Glyph metrics cache — avoids repeated FreeType queries.
- Face size metrics cache — same.
Key insight:
On frame 2+ for static content, EVERY cache level hits. Cost is essentially hash lookups. Even for animated content, only the levels that changed need recomputation.Why libass's blur is fast: cascade blur, not naive gaussian
libass uses a cascade blur algorithm (ass_blur.c), not a naive separable gaussian:
- Downscale the bitmap by 2x repeatedly (using
[1,5,10,10,5,1]kernel) to reduce problem size - Apply a small filter (9-17 taps, radius 4-8) at reduced resolution
- Upscale back
This makes blur cost O(n) regardless of radius — a \blur10 is roughly the same cost as \blur1 because the large radius just means more downscale steps, not more kernel taps.
All blur arithmetic uses int16_t (not f32), with 16.16 fixed-point coefficients. SIMD implementations exist for SSE2/SSSE3/AVX2 (x86/blur.asm) and NEON (aarch64/blur.S).
Why libass's rasteriser is fast: custom tiled rasteriser
libass does NOT use FreeType's rasteriser. It has its own tiled recursive rasteriser (ass_rasterizer.c) with three fast paths per tile:
- No segments → solid fill/clear
- One segment → half-plane fill
- Multiple segments → generic trapezoid rasterisation
Coordinates use 26.6 fixed-point. Segment math uses int32_t/int64_t. All tile functions have SIMD implementations.
Why libass's memory is fast
- Bitmaps are single-channel uint8_t (not RGBA) — 4x less memory, 4x less blur work
- Stride aligned to SIMD width (16/32 bytes) for cache-line-friendly access
ass_aligned_allocfor all bitmap allocations
What libass does NOT do
- libass does NOT composite onto video. It returns positioned alpha masks + colours as a linked list of
ASS_Image. The video player composites. This means libass never allocates or fills a full-frame RGBA pixmap. - libass does NOT use the GPU for any rendering.
Our pipeline waste (specific issues)
| Issue | Our approach | libass approach | Impact |
|---|---|---|---|
| Text shaping | Shape 2-3× per segment per frame (pipeline width calc + pipeline layout + backend render) | Shape once, cache metrics | HIGH — rustybuzz is expensive |
| Style parsing | Re-parse styles from script every frame (prepare_script) | Parse once at load time | MEDIUM |
| Drawing path rasterisation | Re-rasterise every frame (path is cached but bitmap is not) | Cache rasterised bitmap keyed by quantized transform | HIGH — biggest single win |
| Blur | Naive separable gaussian, f32 per-channel, O(w×h×kernel_radius×2) | Cascade blur, int16, O(n), SIMD | CRITICAL — dominates frame time |
| Blur pixmap allocation | Pixmap::new() per blur per frame (alloc + zero-fill) | Reuse buffers, single-channel uint8 | HIGH — alloc overhead + 4× pixel data |
| Blur temp buffer | vec![0u8; data.len()] per blur (alloc + zero) | Pre-allocated stripe buffers | MEDIUM |
| Clip mask allocation | Mask::new(width, height) per drawing per frame | N/A (libass clips during rasterisation) | MEDIUM |
| Composite cache | None — re-render everything every frame | Composite cache keyed by bitmap refs + filter params | HIGH — avoids all blur/render work for static events |
| Pixel format | RGBA (4 bytes/pixel) for everything including temp pixmaps | Single-channel uint8 for rendering, colour applied at composite | HIGH — 4× memory, 4× blur work |
| Frame change detection | None — always re-render | ass_detect_change compares bitmap pointers, returns 0/1/2 | HIGH for static subtitles |
| Full-frame pixmap | Allocate and fill full-frame RGBA pixmap every frame | Returns alpha masks only, caller composites | MEDIUM — could use OffscreenCanvas 2D directly |
Recommended optimisation plan (priority order)
-
Bitmap cache for drawings + text
— Cache rasterised bitmaps keyed by(path_hash, quantized_scale, quantized_rotation). For the "I ask of thee" scene, 12 drawings would be rasterised ONCE and cached. Position/alpha changes are cheap post-raster operations. This alone should reduce frame time from 651ms to ~50ms for frame 2+. -
Composite cache
— Cache post-blur results keyed by(bitmap_refs, blur_params). When the same set of glyphs appears with the same blur, skip the entire blur pipeline. -
Cascade blur
— Replace naive gaussian with downscale→blur→upscale. Makes blur O(n) regardless of radius. Even without SIMD, this is a massive win for\blur3.5+. -
Single-channel rendering
— Render to uint8 alpha bitmaps, apply colour at composite time. 4× less data to blur, 4× less memory. -
Eliminate redundant shaping
— Shape text once per segment per frame, share result between pipeline layout and backend render. Currently shaped 2-3×. -
Buffer reuse
— Pre-allocate blur temp buffers and clip masks, reuse across frames instead of allocating per-frame. -
Frame change detection
— Compare event list + parameters with previous frame. If unchanged, return previous frame data. -
SIMD blur
— Theapply_gaussian_blur_simdstub currently just calls scalar. Implement actual SIMD for WASM (wasm_simd128) and native (SSE2/NEON).
Items 1-3 would likely bring frame time under 16ms for most content. Items 4-8 are incremental improvements.
Colour-independent rendering ("pseudo single-channel")
libass renders to single-channel uint8 alpha bitmaps because ASS subtitles use flat colours — a glyph's shape is independent of its colour. The alpha mask is the expensive part (rasterisation + blur), and colour is a cheap per-pixel multiply at composite time.
We can approximate this within tiny_skia's RGBA model by rendering everything as white with varying alpha (rgba(255, 255, 255, α)). This is the identity for premultiplied-alpha colour multiplication: at composite time, output = cached_pixel * (event_colour / 255) produces the correctly coloured result.
Why this matters for caching:
without colour-independent rendering, the cache key must include the colour. With it, the cache key is just(shape_hash, scale, rotation, blur). This dramatically increases cache hit rate:
- Multi-layer text effects (common fansub technique: same text rendered 2-3x with different colours for fill, outline glow, shadow) → one cached bitmap, 2-3 colour multiplications
- Same
\p1drawing across 200 frames → one rasterisation + blur, 199 cache hits regardless of\tcolour animations - Film grain characters in "Grain Medium" font → glyph-level cache. Each of ~95 printable ASCII shapes cached once, reused across all randomised grain text. The current approach re-rasterises every character every frame (15 lines × ~30 chars × 25fps = 11,250 glyph rasterisations/second → would become ~95 total, cached forever).
Even before moving to actual single-channel buffers, rendering white-with-alpha and applying colour at composite reduces cache cardinality by the number of distinct colours in the script — often 10-50x fewer cache entries needed.