Rustledger Performance Optimization Roadmap

Current Performance (10K transactions)

Benchmark	rustledger	beancount	Speedup
Validation (parse + check)	35ms	754ms	22x faster
Balance report (parse + compute)	118ms	1280ms	11x faster

Target

~~Push the speedup from 5x to 10-20x through systematic optimization.~~ Achieved!

Measured Results

Change	Before	After	Improvement
Phase 0.1: `Arc<str>`	160ms	134ms	16% faster
Phase 1.1: Rc for closures	113ms	141ms	❌ 25% slower (reverted)
Phase 1.1: Zero-copy primitives	108ms	101ms	~7% faster
Phase 2: SmallVec	113ms	143ms	❌ 27% slower (reverted)
Phase 3: Full string interning	30ms	28ms	~6% faster
Phase 4: Rayon parallelization	113ms	108ms	~5% faster
Phase 0.2: PGO	108ms	94ms	13% faster
Phase 5: rkyv cache	30ms	13ms	2.3x faster (cache hit)

Combined improvement: 160ms → 94ms = 41% faster (1.7x speedup on top of existing gains)

With full interning: ~28ms on 7176-line file (cold parse)

With cache hit: 13ms = instant for repeated runs (7176-line file benchmark)

Note: Local benchmarks run on 10K transaction ledger. Rc and SmallVec add overhead that outweighs benefits. Phase 3 extends InternedStr to payee/narration/tags/links for memory deduplication. Cache provides 2.3x speedup on subsequent runs.

Phase 0: Quick Wins (Day 1)

Goal: Low-effort, high-impact changes Expected Impact: 15-25% faster

0.1 Eliminate Source Code Double Allocation

File: crates/rustledger-loader/src/lib.rs
Line: 308
Problem: fs::read_to_string() then source.clone() = 2x memory
Fix: Use Arc\<str\> instead of cloning

rust

// Before
let source = fs::read_to_string(path)?;
source_map.add_file(path, source.clone());  // CLONE!

// After
let source: Arc<str> = fs::read_to_string(path)?.into();
source_map.add_file(path, Arc::clone(&source));  // Cheap refcount

Impact: 50% reduction in source memory, faster loading

0.2 Enable Profile-Guided Optimization (PGO)

File: .cargo/config.toml (new), .github/workflows/release.yml
Change: Build release binaries with PGO data from benchmarks
Impact: 5-15% overall speedup (free optimization)

Phase 1: Parser Allocation Fixes (Week 1)

Goal: Eliminate unnecessary allocations in the parser Expected Impact: 20-30% faster

1.1 Zero-Copy String Parsing

File: crates/rustledger-parser/src/parser.rs
Lines: 622, 886, 922, 934, 942
Problem: Parser calls .to_string() on slices that could stay borrowed
Fix: Return &'a str instead of String, intern at directive construction

rust

// Before
.map(|s: &str| s.to_string())  // Allocates!

// After
.map(|s: &str| s)  // Zero-copy, intern later

Impact: ~15% parsing improvement

1.2 Fix Vector Cloning

File: crates/rustledger-parser/src/parser.rs
Lines: 1055, 1080
Change: Use .into_iter() instead of .clone().into_iter()
Impact: ~5% improvement

1.3 Use Rc for Metadata in Closures

File: crates/rustledger-parser/src/parser.rs
Lines: 1271, 1305, 1329, etc.
Change: Wrap metadata in Rc<Metadata> to avoid cloning
Impact: ~10% improvement

Phase 2: Collection Optimizations (Week 2)

Goal: Reduce heap allocations for small collections Expected Impact: 15-25% faster

2.1 Add SmallVec Dependency

toml

# crates/rustledger-core/Cargo.toml
smallvec = "1.11"

2.2 Convert Small Vectors

rust

// crates/rustledger-core/src/directive.rs
pub tags: SmallVec<[InternedStr; 4]>,    // was Vec<String>
pub links: SmallVec<[InternedStr; 2]>,   // was Vec<String>
pub postings: SmallVec<[Posting; 4]>,    // was Vec<Posting>

2.3 Pre-allocate HashMaps

Add .with_capacity() calls in validation and query execution
Files: rustledger-validate/src/lib.rs, rustledger-query/src/executor.rs

Phase 3: String Interning (Week 3-4) ✅ DONE

Goal: Deduplicate strings across entire ledger Result: ~6% faster, memory deduplication via Arc<str>

3.1 Extend InternedStr Usage ✅

rust

// crates/rustledger-core/src/directive.rs
pub struct Transaction {
    pub payee: Option<InternedStr>,    // was Option<String>
    pub narration: InternedStr,        // was String
    pub tags: Vec<InternedStr>,        // was Vec<String>
    pub links: Vec<InternedStr>,       // was Vec<String>
}

pub struct Document {
    pub tags: Vec<InternedStr>,        // was Vec<String>
    pub links: Vec<InternedStr>,       // was Vec<String>
}

3.2 Cache Re-interning ✅

reintern_directives() deduplicates strings after cache load
Typical deduplication: 150+ strings per ledger
Memory savings from Arc<str> sharing

Phase 4: Parallelization (Week 5-6)

Goal: Use multiple CPU cores Expected Impact: 2-4x faster on multi-core Breaking Changes: None (internal)

4.1 Add Rayon Dependency

toml

# crates/rustledger-validate/Cargo.toml
rayon = "1.8"

4.2 Parallel Transaction Processing

Interpolate transactions in parallel
Validate independent checks in parallel
Keep sorting single-threaded (required for correctness)

Phase 5: Binary Cache Format (Week 5-6) ✅ DONE

Goal: Cache parsed ledgers for instant reload Result: 2.3x faster on cache hit (30ms → 13ms)

5.1 Implement Cache Format ✅

File: crates/rustledger-loader/src/cache.rs
Format: rkyv for zero-copy deserialization
Cache key: SHA256 hash of file mtime + size
Location: ledger.beancount → ledger.beancount.cache

Custom rkyv wrappers for non-rkyv types:

AsDecimal - Decimal as 16-byte binary
AsNaiveDate - Date as i32 days since epoch
AsInternedStr - InternedStr as ArchivedString

5.2 Cache Invalidation ✅

Hash computed from all included files' mtime + size
Graceful fallback on cache errors
invalidate_cache() API for manual invalidation

5.3 CLI Integration ✅

bash

rledger check --no-cache ledger.beancount  # Skip cache
rledger check -C ledger.beancount          # Short form
rledger check ledger.beancount             # Use cache (default)

Phase 6: Lexer + Arena Allocator ✅ PARTIAL

Goal: Replace parser combinators with fast lexer, use arena for AST Expected Impact: 30-50% faster parsing

6.1 Logos Lexer + Winnow Parser ✅ DONE

Using Logos for SIMD-accelerated tokenization
Using Winnow for manual recursive descent parsing
Replaced Chumsky parser combinators (legacy parser removed)
Zero-copy token stream - no allocations during lexing
Implemented in logos_lexer.rs and winnow_parser.rs

6.2 Bumpalo Arena for AST Nodes 🔮 FUTURE

Use bumpalo for AST allocation
Only 11 instructions per allocation (vs ~100 for malloc)
Mass deallocation: just reset the bump pointer
Perfect for phase-oriented allocation (parse → use → discard)
Projected: +20% parsing improvement

Phase 7: Memory-Mapped Files (Future)

Goal: Zero-copy file loading for very large ledgers Expected Impact: 10-20% for files >100MB

7.1 Optional mmap for Large Files

Only enable for files > threshold (e.g., 50MB)
Fallback to standard read for smaller files
Cross-platform support (memmap2 crate)

Roadmap Summary

Phase	Work	Status	Result
0	Quick wins (Arc, PGO)	✅ Done	+29% (16% + 13%)
1	Zero-copy parsing	✅ Done	+7%
2	SmallVec	❌ Reverted	-27% (slower)
3	Full interning	✅ Done	+6%
4	Parallelization (rayon)	✅ Done	+5%
5	Binary cache (rkyv)	✅ Done	2.3x on cache hit
6.1	Logos + Winnow parser	✅ Done	Replaced Chumsky
6.2	Bumpalo arena	🔮 Future	+20% projected
7	Memory-mapped files	🔮 Future	Large files only

Actual Performance

Measured on 10K transaction ledgers (January 2026):

Benchmark	rustledger	beancount	ledger (C++)	hledger
Validation	35ms	754ms	97ms	467ms
Balance report	118ms	1280ms	84ms	571ms

Key results:

22x faster than beancount for validation
11x faster than beancount for balance reports
Competitive with ledger (C++): 2.8x slower validation, 1.4x slower balance
Cache hit: ~13ms for repeated runs

Benchmark Evaluation (January 2026)

Methodology Verification

The benchmark claims have been independently verified. Key findings:

1. What each command measures:

Tool	Command	Operation
rustledger	`rledger check file.beancount`	Parse + validate
beancount	`bean-check file.beancount`	Parse + validate (no plugins on simple files)
ledger	`ledger -f file.ledger accounts`	Parse + list accounts
hledger	`hledger check -f file.ledger`	Parse + validate

All commands perform equivalent work: parse the file and validate correctness.

2. Output equivalence verified: Both rledger check and bean-check produce the same result on test files (no errors, same directive counts).

Scaling Analysis

Transactions	File Size	rustledger	beancount	Speedup
1K	100 KB	4.5ms	149ms	33x
5K	507 KB	16.2ms	-	-
10K	1 MB	30.4ms	744ms	24x
50K	5 MB	147ms	-	-
100K	10 MB	304ms	3,099ms	10x

Key insight: Speedup varies from 10x to 33x depending on file size.

Startup Overhead Analysis

The varying speedup is explained by startup overhead:

Tool	Startup	Processing 10K
rustledger	~2ms	~28ms
beancount	~100ms	~644ms

Small files (1K): Startup dominates → 33x speedup
Large files (100K): Pure processing dominates → 10x speedup
Typical files (10K): Mixed → 20-24x speedup

Scaling Behavior

Both tools exhibit O(n) scaling:

rustledger:

5K → 10K: 16.2ms → 30.4ms (1.9x for 2x input) ✓
10K → 50K: 30.4ms → 147ms (4.8x for 5x input) ✓
50K → 100K: 147ms → 304ms (2.1x for 2x input) ✓

Throughput: ~330K transactions/second (after warmup)

beancount:

1K → 10K: 149ms → 744ms (5.0x for 10x input) ✓
10K → 100K: 744ms → 3,099ms (4.2x for 10x input) ✓

Throughput: ~32K transactions/second (at scale)

Conclusion

The benchmark claims are accurate and fair:

✅ Both tools perform equivalent validation work
✅ Both exhibit linear O(n) scaling
✅ rustledger is genuinely 10-33x faster
✅ Speedup variation explained by startup overhead (2ms vs 100ms)

The "10x faster" claim is conservative (applies to 100K+ transactions). For typical ledgers (1K-10K transactions), rustledger is 20-30x faster.

Measurement Plan

Each phase should be benchmarked:

bash

# Before/after each phase
cargo bench --bench pipeline_bench

# Nightly CI comparison (already set up)
# Results in benchmarks branch

Decision Points

After Phase 0: Measure baseline improvement before deeper work
After Phase 3: Evaluate if 12x is sufficient or continue to parallelization
Phase 5 (Cache): High value for development workflows, optional for CI
Phase 6-7: Only pursue if profiling shows remaining bottlenecks

Research & References

Parser Performance

Logos - current lexer, SIMD-accelerated DFA
Winnow - current parser, manual recursive descent
Chumsky - former parser, replaced by Winnow (removed)

Serialization

rkyv - zero-copy deserialization, faster than bincode
rust_serialization_benchmark - comprehensive comparison

Memory Management

bumpalo - fast arena allocator (11 instructions/alloc)
Guide to arenas in Rust

String Processing

memchr - SIMD-accelerated string search
aho-corasick - SIMD multi-pattern matching

Compiler Optimizations

PGO in Rust - 10-30% improvement
Rust compiler performance 2025 - 6x faster builds

Rustledger Performance Optimization Roadmap ​

Current Performance (10K transactions) ​

Target ​

Measured Results ​

Phase 0: Quick Wins (Day 1) ​

0.1 Eliminate Source Code Double Allocation ​

0.2 Enable Profile-Guided Optimization (PGO) ​

Phase 1: Parser Allocation Fixes (Week 1) ​

1.1 Zero-Copy String Parsing ​

1.2 Fix Vector Cloning ​

1.3 Use Rc for Metadata in Closures ​

Phase 2: Collection Optimizations (Week 2) ​

2.1 Add SmallVec Dependency ​

2.2 Convert Small Vectors ​

2.3 Pre-allocate HashMaps ​

Phase 3: String Interning (Week 3-4) ✅ DONE ​

3.1 Extend InternedStr Usage ✅ ​

3.2 Cache Re-interning ✅ ​

Phase 4: Parallelization (Week 5-6) ​

4.1 Add Rayon Dependency ​

4.2 Parallel Transaction Processing ​

Phase 5: Binary Cache Format (Week 5-6) ✅ DONE ​

5.1 Implement Cache Format ✅ ​

5.2 Cache Invalidation ✅ ​

5.3 CLI Integration ✅ ​

Phase 6: Lexer + Arena Allocator ✅ PARTIAL ​

6.1 Logos Lexer + Winnow Parser ✅ DONE ​

6.2 Bumpalo Arena for AST Nodes 🔮 FUTURE ​

Phase 7: Memory-Mapped Files (Future) ​

7.1 Optional mmap for Large Files ​

Roadmap Summary ​

Actual Performance ​

Benchmark Evaluation (January 2026) ​

Methodology Verification ​

Scaling Analysis ​

Startup Overhead Analysis ​

Scaling Behavior ​

Conclusion ​

Measurement Plan ​

Decision Points ​

Research & References ​

Parser Performance ​

Serialization ​

Memory Management ​

String Processing ​

Compiler Optimizations ​

Rustledger Performance Optimization Roadmap

Current Performance (10K transactions)

Target

Measured Results

Phase 0: Quick Wins (Day 1)

0.1 Eliminate Source Code Double Allocation

0.2 Enable Profile-Guided Optimization (PGO)

Phase 1: Parser Allocation Fixes (Week 1)

1.1 Zero-Copy String Parsing

1.2 Fix Vector Cloning

1.3 Use Rc for Metadata in Closures

Phase 2: Collection Optimizations (Week 2)

2.1 Add SmallVec Dependency

2.2 Convert Small Vectors

2.3 Pre-allocate HashMaps

Phase 3: String Interning (Week 3-4) ✅ DONE

3.1 Extend InternedStr Usage ✅

3.2 Cache Re-interning ✅

Phase 4: Parallelization (Week 5-6)

4.1 Add Rayon Dependency

4.2 Parallel Transaction Processing

Phase 5: Binary Cache Format (Week 5-6) ✅ DONE

5.1 Implement Cache Format ✅

5.2 Cache Invalidation ✅

5.3 CLI Integration ✅

Phase 6: Lexer + Arena Allocator ✅ PARTIAL

6.1 Logos Lexer + Winnow Parser ✅ DONE

6.2 Bumpalo Arena for AST Nodes 🔮 FUTURE

Phase 7: Memory-Mapped Files (Future)

7.1 Optional mmap for Large Files

Roadmap Summary

Actual Performance

Benchmark Evaluation (January 2026)

Methodology Verification

Scaling Analysis

Startup Overhead Analysis

Scaling Behavior

Conclusion

Measurement Plan

Decision Points

Research & References

Parser Performance

Serialization

Memory Management

String Processing

Compiler Optimizations