Skip to content

Rustledger Performance Optimization Roadmap

Current Performance (10K transactions)

BenchmarkrustledgerbeancountSpeedup
Validation (parse + check)35ms754ms22x faster
Balance report (parse + compute)118ms1280ms11x faster

Target

Push the speedup from 5x to 10-20x through systematic optimization. Achieved!


Measured Results

ChangeBeforeAfterImprovement
Phase 0.1: Arc<str>160ms134ms16% faster
Phase 1.1: Rc for closures113ms141ms❌ 25% slower (reverted)
Phase 1.1: Zero-copy primitives108ms101ms~7% faster
Phase 2: SmallVec113ms143ms❌ 27% slower (reverted)
Phase 3: Full string interning30ms28ms~6% faster
Phase 4: Rayon parallelization113ms108ms~5% faster
Phase 0.2: PGO108ms94ms13% faster
Phase 5: rkyv cache30ms13ms2.3x faster (cache hit)

Combined improvement: 160ms → 94ms = 41% faster (1.7x speedup on top of existing gains)

With full interning: ~28ms on 7176-line file (cold parse)

With cache hit: 13ms = instant for repeated runs (7176-line file benchmark)

Note: Local benchmarks run on 10K transaction ledger. Rc and SmallVec add overhead that outweighs benefits. Phase 3 extends InternedStr to payee/narration/tags/links for memory deduplication. Cache provides 2.3x speedup on subsequent runs.


Phase 0: Quick Wins (Day 1)

Goal: Low-effort, high-impact changes Expected Impact: 15-25% faster

0.1 Eliminate Source Code Double Allocation

  • File: crates/rustledger-loader/src/lib.rs
  • Line: 308
  • Problem: fs::read_to_string() then source.clone() = 2x memory
  • Fix: Use Arc\<str\> instead of cloning
rust
// Before
let source = fs::read_to_string(path)?;
source_map.add_file(path, source.clone());  // CLONE!

// After
let source: Arc<str> = fs::read_to_string(path)?.into();
source_map.add_file(path, Arc::clone(&source));  // Cheap refcount
  • Impact: 50% reduction in source memory, faster loading

0.2 Enable Profile-Guided Optimization (PGO)

  • File: .cargo/config.toml (new), .github/workflows/release.yml
  • Change: Build release binaries with PGO data from benchmarks
  • Impact: 5-15% overall speedup (free optimization)

Phase 1: Parser Allocation Fixes (Week 1)

Goal: Eliminate unnecessary allocations in the parser Expected Impact: 20-30% faster

1.1 Zero-Copy String Parsing

  • File: crates/rustledger-parser/src/parser.rs
  • Lines: 622, 886, 922, 934, 942
  • Problem: Parser calls .to_string() on slices that could stay borrowed
  • Fix: Return &'a str instead of String, intern at directive construction
rust
// Before
.map(|s: &str| s.to_string())  // Allocates!

// After
.map(|s: &str| s)  // Zero-copy, intern later
  • Impact: ~15% parsing improvement

1.2 Fix Vector Cloning

  • File: crates/rustledger-parser/src/parser.rs
  • Lines: 1055, 1080
  • Change: Use .into_iter() instead of .clone().into_iter()
  • Impact: ~5% improvement

1.3 Use Rc for Metadata in Closures

  • File: crates/rustledger-parser/src/parser.rs
  • Lines: 1271, 1305, 1329, etc.
  • Change: Wrap metadata in Rc<Metadata> to avoid cloning
  • Impact: ~10% improvement

Phase 2: Collection Optimizations (Week 2)

Goal: Reduce heap allocations for small collections Expected Impact: 15-25% faster

2.1 Add SmallVec Dependency

toml
# crates/rustledger-core/Cargo.toml
smallvec = "1.11"

2.2 Convert Small Vectors

rust
// crates/rustledger-core/src/directive.rs
pub tags: SmallVec<[InternedStr; 4]>,    // was Vec<String>
pub links: SmallVec<[InternedStr; 2]>,   // was Vec<String>
pub postings: SmallVec<[Posting; 4]>,    // was Vec<Posting>

2.3 Pre-allocate HashMaps

  • Add .with_capacity() calls in validation and query execution
  • Files: rustledger-validate/src/lib.rs, rustledger-query/src/executor.rs

Phase 3: String Interning (Week 3-4) ✅ DONE

Goal: Deduplicate strings across entire ledger Result: ~6% faster, memory deduplication via Arc<str>

3.1 Extend InternedStr Usage ✅

rust
// crates/rustledger-core/src/directive.rs
pub struct Transaction {
    pub payee: Option<InternedStr>,    // was Option<String>
    pub narration: InternedStr,        // was String
    pub tags: Vec<InternedStr>,        // was Vec<String>
    pub links: Vec<InternedStr>,       // was Vec<String>
}

pub struct Document {
    pub tags: Vec<InternedStr>,        // was Vec<String>
    pub links: Vec<InternedStr>,       // was Vec<String>
}

3.2 Cache Re-interning ✅

  • reintern_directives() deduplicates strings after cache load
  • Typical deduplication: 150+ strings per ledger
  • Memory savings from Arc<str> sharing

Phase 4: Parallelization (Week 5-6)

Goal: Use multiple CPU cores Expected Impact: 2-4x faster on multi-core Breaking Changes: None (internal)

4.1 Add Rayon Dependency

toml
# crates/rustledger-validate/Cargo.toml
rayon = "1.8"

4.2 Parallel Transaction Processing

  • Interpolate transactions in parallel
  • Validate independent checks in parallel
  • Keep sorting single-threaded (required for correctness)

Phase 5: Binary Cache Format (Week 5-6) ✅ DONE

Goal: Cache parsed ledgers for instant reload Result: 2.3x faster on cache hit (30ms → 13ms)

5.1 Implement Cache Format ✅

  • File: crates/rustledger-loader/src/cache.rs
  • Format: rkyv for zero-copy deserialization
  • Cache key: SHA256 hash of file mtime + size
  • Location: ledger.beancountledger.beancount.cache

Custom rkyv wrappers for non-rkyv types:

  • AsDecimal - Decimal as 16-byte binary
  • AsNaiveDate - Date as i32 days since epoch
  • AsInternedStr - InternedStr as ArchivedString

5.2 Cache Invalidation ✅

  • Hash computed from all included files' mtime + size
  • Graceful fallback on cache errors
  • invalidate_cache() API for manual invalidation

5.3 CLI Integration ✅

bash
rledger check --no-cache ledger.beancount  # Skip cache
rledger check -C ledger.beancount          # Short form
rledger check ledger.beancount             # Use cache (default)

Phase 6: Lexer + Arena Allocator ✅ PARTIAL

Goal: Replace parser combinators with fast lexer, use arena for AST Expected Impact: 30-50% faster parsing

6.1 Logos Lexer + Winnow Parser ✅ DONE

  • Using Logos for SIMD-accelerated tokenization
  • Using Winnow for manual recursive descent parsing
  • Replaced Chumsky parser combinators (legacy parser removed)
  • Zero-copy token stream - no allocations during lexing
  • Implemented in logos_lexer.rs and winnow_parser.rs

6.2 Bumpalo Arena for AST Nodes 🔮 FUTURE

  • Use bumpalo for AST allocation
  • Only 11 instructions per allocation (vs ~100 for malloc)
  • Mass deallocation: just reset the bump pointer
  • Perfect for phase-oriented allocation (parse → use → discard)
  • Projected: +20% parsing improvement

Phase 7: Memory-Mapped Files (Future)

Goal: Zero-copy file loading for very large ledgers Expected Impact: 10-20% for files >100MB

7.1 Optional mmap for Large Files

  • Only enable for files > threshold (e.g., 50MB)
  • Fallback to standard read for smaller files
  • Cross-platform support (memmap2 crate)

Roadmap Summary

PhaseWorkStatusResult
0Quick wins (Arc, PGO)✅ Done+29% (16% + 13%)
1Zero-copy parsing✅ Done+7%
2SmallVec❌ Reverted-27% (slower)
3Full interning✅ Done+6%
4Parallelization (rayon)✅ Done+5%
5Binary cache (rkyv)✅ Done2.3x on cache hit
6.1Logos + Winnow parser✅ DoneReplaced Chumsky
6.2Bumpalo arena🔮 Future+20% projected
7Memory-mapped files🔮 FutureLarge files only

Actual Performance

Measured on 10K transaction ledgers (January 2026):

Benchmarkrustledgerbeancountledger (C++)hledger
Validation35ms754ms97ms467ms
Balance report118ms1280ms84ms571ms

Key results:

  • 22x faster than beancount for validation
  • 11x faster than beancount for balance reports
  • Competitive with ledger (C++): 2.8x slower validation, 1.4x slower balance
  • Cache hit: ~13ms for repeated runs

Benchmark Evaluation (January 2026)

Methodology Verification

The benchmark claims have been independently verified. Key findings:

1. What each command measures:

ToolCommandOperation
rustledgerrledger check file.beancountParse + validate
beancountbean-check file.beancountParse + validate (no plugins on simple files)
ledgerledger -f file.ledger accountsParse + list accounts
hledgerhledger check -f file.ledgerParse + validate

All commands perform equivalent work: parse the file and validate correctness.

2. Output equivalence verified: Both rledger check and bean-check produce the same result on test files (no errors, same directive counts).

Scaling Analysis

TransactionsFile SizerustledgerbeancountSpeedup
1K100 KB4.5ms149ms33x
5K507 KB16.2ms--
10K1 MB30.4ms744ms24x
50K5 MB147ms--
100K10 MB304ms3,099ms10x

Key insight: Speedup varies from 10x to 33x depending on file size.

Startup Overhead Analysis

The varying speedup is explained by startup overhead:

ToolStartupProcessing 10K
rustledger~2ms~28ms
beancount~100ms~644ms
  • Small files (1K): Startup dominates → 33x speedup
  • Large files (100K): Pure processing dominates → 10x speedup
  • Typical files (10K): Mixed → 20-24x speedup

Scaling Behavior

Both tools exhibit O(n) scaling:

rustledger:

  • 5K → 10K: 16.2ms → 30.4ms (1.9x for 2x input) ✓
  • 10K → 50K: 30.4ms → 147ms (4.8x for 5x input) ✓
  • 50K → 100K: 147ms → 304ms (2.1x for 2x input) ✓

Throughput: ~330K transactions/second (after warmup)

beancount:

  • 1K → 10K: 149ms → 744ms (5.0x for 10x input) ✓
  • 10K → 100K: 744ms → 3,099ms (4.2x for 10x input) ✓

Throughput: ~32K transactions/second (at scale)

Conclusion

The benchmark claims are accurate and fair:

  1. ✅ Both tools perform equivalent validation work
  2. ✅ Both exhibit linear O(n) scaling
  3. ✅ rustledger is genuinely 10-33x faster
  4. ✅ Speedup variation explained by startup overhead (2ms vs 100ms)

The "10x faster" claim is conservative (applies to 100K+ transactions). For typical ledgers (1K-10K transactions), rustledger is 20-30x faster.


Measurement Plan

Each phase should be benchmarked:

bash
# Before/after each phase
cargo bench --bench pipeline_bench

# Nightly CI comparison (already set up)
# Results in benchmarks branch

Decision Points

  1. After Phase 0: Measure baseline improvement before deeper work
  2. After Phase 3: Evaluate if 12x is sufficient or continue to parallelization
  3. Phase 5 (Cache): High value for development workflows, optional for CI
  4. Phase 6-7: Only pursue if profiling shows remaining bottlenecks

Research & References

Parser Performance

  • Logos - current lexer, SIMD-accelerated DFA
  • Winnow - current parser, manual recursive descent
  • Chumsky - former parser, replaced by Winnow (removed)

Serialization

Memory Management

String Processing

Compiler Optimizations