Note that this is FULL CLAUDE written. Don't read this if you don't want to laugh at one LLM critiquing the fuck out of another.

ass-renderer / ass-core Audit

Audit conducted 2026-04-09 during Phase 7 of the subtitle system implementation.

Actively Broken (HIGH)

GPU backends are non-functional stubs

  • WebGPU (backends/web/pipeline.rs): WebGpuPipeline returns empty results from all methods. create_pipeline() in web/mod.rs falls back to SoftwarePipeline. The composite_layers method uses pollster::block_on which won't work in WASM.
  • Metal (backends/hardware/metal.rs): render_layers() has stub comments ("In production, would create vertex buffer"). Falls back to SoftwarePipeline.
  • Vulkan (backends/hardware/vulkan.rs): Shader loading is stubbed. No actual Vulkan draw calls. Falls back to SoftwarePipeline.

All three silently fall back to software rendering while reporting BackendType::WebGPU/Metal/Vulkan.

Plan: WebGPU blit path (Phase 7, Task 4) will replace the WebGPU stubs with a software-render-then-GPU-blit approach. Metal and Vulkan will follow the same pattern when needed (see roadmap). The existing stub code that tries to do GPU-side text rendering should be removed.

Streaming parser is fake

  • parser/streaming.rs: StreamingResult::sections is Vec<String>, not parsed AST nodes. Comments say "simplified."
  • parser/streaming/processor.rs: process_section_content() returns empty DeltaBatch::new(). Comments say "In full parser this would..." and "In production..."

Plan: Not used by vodplace. Can be removed or properly implemented if incremental parsing becomes needed.

Fixed in This Session

Duplicate tag parsers (FIXED)

  • text_segmenter.rs had its own ~220-line match block duplicating tag_processor.rs, with a _ => {} catch-all silently dropping \blur, \be, \bord, \shad, \clip, \iclip, \q, \r, \org, \fax, \fay, and more.
  • Fix: Replaced with single tag_processor::apply_tag() call. 41 new tests added.

Duplicate colour parsers (FIXED)

  • SoftwarePipeline::parse_ass_color and ass_core::utils::parse_bgr_color both parsed ASS colours with different alpha handling.
  • Fix: parse_ass_color now delegates to parse_bgr_color.

Alpha inversion bug (FIXED)

  • ass_core::utils::parse_bgr_color treated ASS alpha 00 as transparent (RGBA 0) instead of opaque (RGBA 255). 6-char colours defaulted to alpha 0 (invisible).
  • Fix: Corrected alpha inversion. All tests updated.

Triple fontdb (FIXED)

  • Three separate fontdb::Database instances in RenderContext, SoftwarePipeline, SoftwareBackend. set_font_database() cloned into all three. composite_layers() created a throwaway backend per frame.
  • Fix: Single Arc<fontdb::Database> shared everywhere. Box<dyn RenderBackend> instead of Arc. Persistent backend with pixmap clear.

Font fallback hard failure (FIXED)

  • Renderer errored out with "Font not found and no fallback available" when exact font name wasn't in fontdb, even with fonts loaded.
  • Fix: Register generic families, absolute last-resort returns any loaded face.

Blur rendering order (FIXED)

  • \blur only blurred the text fill, not shadow/outline. Entire event should be blurred.
  • Fix: When blur present, all rendering (shadow + outline + fill + underline + strikethrough) goes to temp pixmap, gets blurred, then composited.

Fixed 2026-04-10

Rotation used radians instead of degrees (FIXED)

  • backends/software.rs: pre_rotate(angle_rad) — tiny_skia's pre_rotate takes degrees, not radians. All \frz rotations were ~57x too small (e.g., 13.73° rendered as 0.24°).
  • Fix: Pass degrees directly to pre_rotate.

Drawing paths not scaled (FIXED)

  • \p1 drawing commands were only translated to position, never scaled from PlayRes to canvas coordinates. No fscx/fscy or draw_level scaling applied.
  • Fix: Apply draw_scale * scale_x/y * font_scale_x/y transform to path geometry.

Drawing clip regions not applied (FIXED)

  • \clip tags on \p1 events were parsed but never passed through VectorData to the backend. All drawings rendered unclipped — combined heart+bone paths showed both shapes instead of only the clipped portion.
  • Fix: Added clip field to VectorData, create tiny_skia::Mask in backend.

Letter spacing not scaled or included in width (FIXED)

  • \fsp values were applied as raw pixels without PlayRes→canvas scaling, AND not included in total_line_width / seg_advance calculations. Text with \fsp was too wide (unscaled spacing) and misaligned (width not accounting for spacing).
  • Fix: Scale spacing by scale_x, include in width calculations.

Per-channel alpha not animatable (FIXED)

  • \1a, \2a, \3a, \4a were not recognised as \t() animation targets. \t(455,500,\1a&H00&) silently dropped the \1a target, leaving primary alpha permanently at its initial value.
  • Fix: Added PrimaryAlpha, SecondaryAlpha, OutlineAlpha, ShadowAlpha variants to AnimatableTag.

FontDb wrapper (FIXED)

  • Raw fontdb::Database didn't support fonts fontdb rejects (e.g., Symbol encoding fonts like SSF4 ABUKET). No fallback mechanism.
  • Fix: New FontDb wrapper with RwLock<Database> + RwLock<AHashMap<String, Arc<Vec<u8>>>> fallback font map. Shared via Arc<FontDb>. Fallback shaping path bypasses fontdb for rejected fonts.

MKV subtitle extraction missing timestamps (FIXED)

  • ass-tool extract-mkv prepended Dialogue: to raw ffmpeg packet data, which doesn't include Start/End timestamps (those are in PTS/duration metadata). All extracted ASS files had broken timing — no subtitles displayed.
  • Fix: Reconstruct H:MM:SS.CC timestamps from packet.pts() and packet.duration() using stream time_base.

Font weight suffix lookup (FIXED)

  • "Dosis Light" was not recognised as "Dosis" + Light weight. External font resolver redundantly tried to fetch fonts already loaded under their base family name.
  • Fix: Shared strip_weight_suffix() in ass_renderer::utils::font. Worker's is_family_loaded() and renderer's try_split_family_weight() both use it.

Performance (HIGH)

Software rendering is too slow for complex fansub typesetting

  • Measured: 651ms per frame on an "I ask of thee" scene with 12 complex \p1 glyph-outline drawings + blur + a full-screen black rectangle + film grain text overlay. Less than 2fps.
  • Root cause: Gaussian blur on large complex paths in software is O(width × height × kernel). Multiple blurred drawings per frame compounds the cost.
  • Comparison: WASM libass should handle this at interactive framerates — the performance gap suggests our rendering pipeline is doing significantly more work per frame than necessary, or doing it less efficiently.
  • Mitigation strategies (in order of expected impact):
    1. Pre-render / lookahead cache: We have the full ASS file upfront plus a playback buffer of seconds to minutes. Pre-render upcoming frames on a background thread / during idle time. Cache rasterised drawing paths keyed by (path_hash, scale, blur_params).
    2. Drawing path cache: Same \p1 path at same scale/blur appears across many frames. Cache the blurred rasterised result instead of re-rasterising every frame. Invalidate only when position/alpha/clip changes (those are cheap post-raster operations).
    3. Blur optimisation: Current gaussian blur is naive scalar. Consider: box blur approximation (3-pass box blur ≈ gaussian), downscale-blur-upscale, or SIMD (the apply_gaussian_blur_simd function currently just calls scalar).
    4. WebGPU blit path (Task #25): Move blur and compositing to GPU. Software rasterise glyph outlines, upload texture, GPU does blur + blend. Major architecture change but eliminates the CPU bottleneck entirely.
    5. Render resolution scaling: Render subtitle overlay at reduced resolution (e.g., 50%) and upscale. Quality tradeoff but linear speedup.

Fixed 2026-04-11

Font scaling denominator wrong (FIXED)

  • Our renderer: Used font_size / units_per_em to convert font units to pixels.
  • libass: Overrides FreeType's face->ascender/face->descender with OS/2 usWinAscent/usWinDescent (GDI compatibility, see set_font_metrics() in ass_font.c:278-311), then FT_SIZE_REQUEST_TYPE_REAL_DIM divides by ascender - descender. For IwataMinchoProM (UPM=1000, usWin total=1310), this produced glyphs 32% too large in our renderer.
  • Fix: FontMetrics::from_face now mirrors libass's metric selection: usWin first, then hhea (or sTypo if USE_TYPO_METRICS), then sTypo fallback, then bbox. Both shaping and outline rendering use font_size / (ascender - descender) as the scale denominator.

\move timing double-conversion (FIXED)

  • parse_move_args converted milliseconds to centiseconds, then calculate_position_from_tags divided by 10 again. Movement completed 10× too fast — a 7.7-second animation finished in 0.77 seconds.
  • Fix: Removed the redundant /10 in the text path.

Text rotation center wrong (FIXED)

  • Rotated around (shaped.width/2, shaped.height/2) (text bounding box center). libass rotates around the \pos/\move anchor point.
  • Fix: Pass anchor point through TextData.anchor. Backend computes rotation center as (anchor_x - data.x, anchor_y - baseline_y) in glyph-local coords.

\frz on \p1 drawings not applied (FIXED)

  • Drawing/vector path never added Rotation to the effects list, and draw_vector_layer didn't process rotation effects. \frz on drawings was silently ignored.
  • Fix: Added rotation to VectorData effects in the pipeline. Backend applies rotation around bounding box center before drawing.

Complex \fade broken (FIXED)

  • \fade(a1,a2,a3,t1,t2,t3,t4) (7-parameter form) was implemented as a linear interpolation from alpha_start to alpha_end using a single progress value, ignoring alpha_middle, time_fade_in, and time_fade_out. Text with complex fades was always invisible.
  • Fix: Proper 3-phase interpolation: fade-in (a1→a2 over t1→t2), hold (a2 over t2→t3), fade-out (a2→a3 over t3→t4).

Outline too thin (FIXED)

  • Used stroke.width = border_width * 0.6 — a fudge factor calibrated when text was the wrong size.
  • libass uses FT_Stroker which expands outward from the glyph outline by the full border width (one-sided). tiny_skia's stroke extends width/2 on each side.
  • Fix: Stroke at width * 2.0, then render to temp pixmap and punch out the glyph interior with DestinationOut blend mode, leaving only the outward ring. Clean one-sided border.

Shadow too small (FIXED)

  • Shadow was drawn as just the glyph fill paths offset by shadow distance. In libass, the shadow is the full bordered glyph shape (fill + outline) shifted.
  • Fix: Shadow now strokes the glyph paths with the border width before filling, producing the full bordered shadow shape.

Shadow offset halved (FIXED)

  • Shadow x/y offsets had an erroneous * 0.5 factor, making shadows appear half as far from the text as specified.
  • Fix: Removed the * 0.5.

font-metrics diagnostic command (NEW)

  • Added ass-tool font-metrics <dir> command that dumps UPM, hhea, OS/2 sTypo, usWin, USE_TYPO_METRICS flag, lineGap, and computed FreeType REAL_DIM denominator for all fonts in a directory.

Active Issues (as of 2026-04-11)

Rotation center Y offset for scaled text

  • For heavily scaled text (\fscx232\fscy520), the rotation center Y is slightly wrong. The anchor-based rotation is much closer than the old bbox-center approach, but there's a remaining offset related to how the unscaled baseline interacts with the scaled alignment offset in the coordinate chain.
  • Affected content: ioxho8.ass banner text ("The blood-smeared scene of the crime", "...of the culprit").
  • Status: Close but not pixel-perfect. Needs investigation of the coordinate chain: apply_alignment_offset uses scaled dimensions, base_transform starts at baseline_y (unscaled offset from data.y), and the Scale effect is applied after rotation.

software_pipeline_new.rs dead code

  • Unused file from the original vibecoded vendored version. Commented out of module tree. Should be deleted.

Film grain rendering

  • Fansub technique: render random text in a custom "Grain Medium" font at \alpha&HB6&\blur2 to create animated noise overlay. Events change every 4cs (25fps grain animation).
  • Status: Mostly working after font scaling fix. Grain fills the frame. Noise may be slightly too visible compared to reference.

Nisekoi subtitles not rendering

  • Reported after the font scaling changes. May be related to usWin metric selection for Nisekoi's fonts, or a regression in the fade/alpha handling.
  • Status: Not yet investigated.

Maintenance Debt (MEDIUM)

Hardcoded debug string "Чысценькая"

  • Files: backends/software.rs (2 locations), pipeline/software_pipeline.rs (5 locations), renderer/event_selector.rs
  • Debug conditionals like if data.text.contains("Чысценькая") scattered through production code.
  • Plan: Remove all instances. Use log crate (now added) for structured debug logging instead.

Dead code in SoftwarePipeline

  • OwnedStyle has unused fields: name, angle, border_style, encoding (all #[allow(dead_code)])
  • Unused methods at lines 370, 383, 1668 with #[allow(dead_code)]
  • glyph_renderer field marked dead_code with comment "used in future rendering features"
  • Plan: Remove dead fields and methods. If they're needed later they can be re-added from git history.

Glow effect plugin is empty

  • plugin/effects/mod.rs: apply_cpu() does nothing (let _ = ...). shader_code() returns None.
  • Plan: Remove or implement. Don't ship empty effect plugins.

Dirty region clipping doesn't clip

  • backends/software.rs:1099: let _ = region; // TODO: Apply clipping
  • Incremental rendering iterates dirty regions but renders the full scene for each.
  • Plan: Implement actual clip mask per dirty region, or remove the incremental path and always full-render (which is what happens anyway).

Pixel sampling debug output

  • backends/software.rs:1050-1072: Samples two pixels and prints alpha stats in debug builds.
  • Plan: Remove. Use log crate + proper frame analysis tools instead.

Low Priority / Cosmetic

Empty test modules

  • plugin/tags/formatting.rs:162: #[cfg(test)] mod tests {} — zero test coverage for formatting tag handlers.

SIMD blur placeholder

  • apply_gaussian_blur_simd just calls the scalar version. Comment says "real SIMD implementation would use intrinsics."

Simplified contrast calculation

  • debug/analyzer.rs:286: Uses mathematically incorrect formula. Comment says "simplified."

Memory measurement placeholder

  • debug/benchmarking.rs:347: measure_memory_usage() always returns None.

Multiple TODO fields in debug metrics

  • debug/mod.rs:242-248: Hardcoded zeros for active_events, cache_hits, etc.

Custom format line support missing

  • parser/ast/style.rs:145, parser/ast/event.rs:232: // TODO: Support custom format lines

Event selector backward compatibility function

  • renderer/event_selector.rs:259: #[allow(dead_code)] // Kept for backward compatibility

Outline (Border) Implementation

Stroking approach (UPDATED 2026-04-11)

  • libass: Uses FT_Stroker which expands outward from the glyph outline by the full border width (one-sided).
  • Our renderer: Uses tiny_skia::PathStroker::stroke() at width * 2.0 (so the outward half equals the border width), then punches out the glyph interior via a temp pixmap + DestinationOut blend. This produces a clean one-sided outward-only border.
  • Stroke order: The outline is stroked on the UNSCALED glyph path, then the Scale transform (fscx/fscy) is applied to the stroked result. This matches libass where the border is applied in pre-scale glyph space.
  • Shadow: Shadow is drawn as the full bordered glyph shape (stroke + fill) offset by shadow distance, matching libass.
  • Status: Visually matches libass for tested content. The DestinationOut approach is slightly more expensive than a true offset-curve stroker but produces clean results.

Blur Implementation Status

\blur (gaussian blur)

  • Current: Treats \blur value as sigma directly. Should convert radius to sigma per libass: sigma = 2 * radius / sqrt(ln(256)).
  • Algorithm: Separable gaussian with precomputed 1-D kernel at ±3σ. Correct.

\be (edge blur)

  • Current: 3×3 weighted kernel [[1,2,1],[2,4,2],[1,2,1]], n passes. Correct per ASS spec.

Blur scope

  • Current: Blur covers entire event (shadow + outline + fill) via temp pixmap. Correct.

Performance Analysis: libass vs our renderer (2026-04-10)

Benchmark

  • Our renderer: 651ms per frame on "I ask of thee" scene (12 complex \p1 glyph-outline drawings with \blur3.5 + full-screen black rectangle + film grain text overlay + regular dialogue). Less than 2fps.
  • libass: Same content at interactive framerates in pure software (no GPU).

Why libass is fast: six-level cache hierarchy

libass has six caches (ass_cache.c, created at renderer init):

  1. Font cache — keyed by (family, bold, italic, vertical). Avoids re-loading fonts.
  2. Outline cache — keyed by glyph ID+size (for text) or raw drawing text string (for \p1). The parsed ASS_Outline is cached. For our 12 \p1 drawings, the path is parsed ONCE EVER.
  3. Bitmap cache — keyed by (outline pointer, quantized transform matrix, quantized offset). The rasterised bitmap is cached. If scale/rotation haven't changed (or changed within quantization tolerance), the rasterised result is reused.
  4. Composite cache — keyed by (filter descriptor + array of BitmapRef pointers with positions). Stores post-blur, post-outline-subtraction, post-shadow results. This is the BIG one: the entire blur+composite pipeline result is cached. On frame 2+, if nothing changed, it's a single hash lookup.
  5. Glyph metrics cache — avoids repeated FreeType queries.
  6. Face size metrics cache — same.

Key insight: On frame 2+ for static content, EVERY cache level hits. Cost is essentially hash lookups. Even for animated content, only the levels that changed need recomputation.

Why libass's blur is fast: cascade blur, not naive gaussian

libass uses a cascade blur algorithm (ass_blur.c), not a naive separable gaussian:

  1. Downscale the bitmap by 2x repeatedly (using [1,5,10,10,5,1] kernel) to reduce problem size
  2. Apply a small filter (9-17 taps, radius 4-8) at reduced resolution
  3. Upscale back

This makes blur cost O(n) regardless of radius — a \blur10 is roughly the same cost as \blur1 because the large radius just means more downscale steps, not more kernel taps.

All blur arithmetic uses int16_t (not f32), with 16.16 fixed-point coefficients. SIMD implementations exist for SSE2/SSSE3/AVX2 (x86/blur.asm) and NEON (aarch64/blur.S).

Why libass's rasteriser is fast: custom tiled rasteriser

libass does NOT use FreeType's rasteriser. It has its own tiled recursive rasteriser (ass_rasterizer.c) with three fast paths per tile:

  • No segments → solid fill/clear
  • One segment → half-plane fill
  • Multiple segments → generic trapezoid rasterisation

Coordinates use 26.6 fixed-point. Segment math uses int32_t/int64_t. All tile functions have SIMD implementations.

Why libass's memory is fast

  • Bitmaps are single-channel uint8_t (not RGBA) — 4x less memory, 4x less blur work
  • Stride aligned to SIMD width (16/32 bytes) for cache-line-friendly access
  • ass_aligned_alloc for all bitmap allocations

What libass does NOT do

  • libass does NOT composite onto video. It returns positioned alpha masks + colours as a linked list of ASS_Image. The video player composites. This means libass never allocates or fills a full-frame RGBA pixmap.
  • libass does NOT use the GPU for any rendering.

Our pipeline waste (specific issues)

IssueOur approachlibass approachImpact
Text shapingShape 2-3× per segment per frame (pipeline width calc + pipeline layout + backend render)Shape once, cache metricsHIGH — rustybuzz is expensive
Style parsingRe-parse styles from script every frame (prepare_script)Parse once at load timeMEDIUM
Drawing path rasterisationRe-rasterise every frame (path is cached but bitmap is not)Cache rasterised bitmap keyed by quantized transformHIGH — biggest single win
BlurNaive separable gaussian, f32 per-channel, O(w×h×kernel_radius×2)Cascade blur, int16, O(n), SIMDCRITICAL — dominates frame time
Blur pixmap allocationPixmap::new() per blur per frame (alloc + zero-fill)Reuse buffers, single-channel uint8HIGH — alloc overhead + 4× pixel data
Blur temp buffervec![0u8; data.len()] per blur (alloc + zero)Pre-allocated stripe buffersMEDIUM
Clip mask allocationMask::new(width, height) per drawing per frameN/A (libass clips during rasterisation)MEDIUM
Composite cacheNone — re-render everything every frameComposite cache keyed by bitmap refs + filter paramsHIGH — avoids all blur/render work for static events
Pixel formatRGBA (4 bytes/pixel) for everything including temp pixmapsSingle-channel uint8 for rendering, colour applied at compositeHIGH — 4× memory, 4× blur work
Frame change detectionNone — always re-renderass_detect_change compares bitmap pointers, returns 0/1/2HIGH for static subtitles
Full-frame pixmapAllocate and fill full-frame RGBA pixmap every frameReturns alpha masks only, caller compositesMEDIUM — could use OffscreenCanvas 2D directly

Recommended optimisation plan (priority order)

  1. Bitmap cache for drawings + text — Cache rasterised bitmaps keyed by (path_hash, quantized_scale, quantized_rotation). For the "I ask of thee" scene, 12 drawings would be rasterised ONCE and cached. Position/alpha changes are cheap post-raster operations. This alone should reduce frame time from 651ms to ~50ms for frame 2+.

  2. Composite cache — Cache post-blur results keyed by (bitmap_refs, blur_params). When the same set of glyphs appears with the same blur, skip the entire blur pipeline.

  3. Cascade blur — Replace naive gaussian with downscale→blur→upscale. Makes blur O(n) regardless of radius. Even without SIMD, this is a massive win for \blur3.5+.

  4. Single-channel rendering — Render to uint8 alpha bitmaps, apply colour at composite time. 4× less data to blur, 4× less memory.

  5. Eliminate redundant shaping — Shape text once per segment per frame, share result between pipeline layout and backend render. Currently shaped 2-3×.

  6. Buffer reuse — Pre-allocate blur temp buffers and clip masks, reuse across frames instead of allocating per-frame.

  7. Frame change detection — Compare event list + parameters with previous frame. If unchanged, return previous frame data.

  8. SIMD blur — The apply_gaussian_blur_simd stub currently just calls scalar. Implement actual SIMD for WASM (wasm_simd128) and native (SSE2/NEON).

Items 1-3 would likely bring frame time under 16ms for most content. Items 4-8 are incremental improvements.

Colour-independent rendering ("pseudo single-channel")

libass renders to single-channel uint8 alpha bitmaps because ASS subtitles use flat colours — a glyph's shape is independent of its colour. The alpha mask is the expensive part (rasterisation + blur), and colour is a cheap per-pixel multiply at composite time.

We can approximate this within tiny_skia's RGBA model by rendering everything as white with varying alpha (rgba(255, 255, 255, α)). This is the identity for premultiplied-alpha colour multiplication: at composite time, output = cached_pixel * (event_colour / 255) produces the correctly coloured result.

Why this matters for caching: without colour-independent rendering, the cache key must include the colour. With it, the cache key is just (shape_hash, scale, rotation, blur). This dramatically increases cache hit rate:

  • Multi-layer text effects (common fansub technique: same text rendered 2-3x with different colours for fill, outline glow, shadow) → one cached bitmap, 2-3 colour multiplications
  • Same \p1 drawing across 200 frames → one rasterisation + blur, 199 cache hits regardless of \t colour animations
  • Film grain characters in "Grain Medium" font → glyph-level cache. Each of ~95 printable ASCII shapes cached once, reused across all randomised grain text. The current approach re-rasterises every character every frame (15 lines × ~30 chars × 25fps = 11,250 glyph rasterisations/second → would become ~95 total, cached forever).

Even before moving to actual single-channel buffers, rendering white-with-alpha and applying colour at composite reduces cache cardinality by the number of distinct colours in the script — often 10-50x fewer cache entries needed.