Skip to content

ADR-0003: Parser Design

Status

Accepted (Updated February 2026)

Context

The Beancount language has a relatively simple grammar but with some complexities:

  • Date-prefixed directives
  • Indentation-sensitive posting syntax
  • Metadata can appear on many directive types
  • String literals with escapes
  • Multiple number formats (with comma grouping)

Options considered:

  1. Parser generator (pest, lalrpop): Generate parser from grammar
  2. Parser combinator (nom, winnow, chumsky): Compose small parsers
  3. Hand-written recursive descent: Manual implementation

Decision

Use Logos for lexing and Winnow for parsing (parser combinators).

Lexer (Logos)

The lexer (logos_lexer.rs) uses Logos, a SIMD-accelerated lexer generator:

  • Declarative token definitions via derive macros
  • ~54x faster than hand-written character iteration
  • Produces Vec<SpannedToken> with byte offset spans

Tokens include:

  • Keywords (open, close, balance, etc.)
  • Dates, numbers, strings, accounts, currencies
  • Operators and punctuation

Parser (Winnow)

The parser (winnow_parser.rs) uses a manual token stream with winnow-style parsing:

  • Composable parsers for each directive type
  • Manual token stream for simplicity and performance
  • Error recovery continues parsing after errors
  • Span tracking propagated through all parse results

Architecture:

text
Source (&str) → Logos tokenize() → Vec<SpannedToken> → Winnow parser → Directives

Consequences

Positive

  • Logos provides excellent lexer performance (SIMD-accelerated)
  • Winnow is lightweight with minimal compile-time overhead
  • Manual token stream approach is simpler than trait-based streams
  • Type-safe parser composition catches errors at compile time
  • No external grammar DSL files to maintain

Negative

  • Manual token stream requires more boilerplate
  • Less built-in error recovery than some frameworks
  • Must handle overflow/edge cases explicitly (e.g., checked arithmetic)

Neutral

  • Parser is ~1500 lines, manageable for the grammar size
  • Error messages require tuning for user-friendliness

Notes

The parser is organized into sections:

  1. Token stream types and helpers
  2. Primitive parsers (date, number, string, account)
  3. Expression parsers (arithmetic with checked overflow)
  4. Amount and cost parsers
  5. Directive parsers (transaction, balance, open, etc.)
  6. Top-level file parser with error recovery

History

  • Original decision: Hand-written recursive descent parser
  • January 2026: Migrated to Logos + Chumsky for better performance
  • February 2026: Migrated from Chumsky to Winnow for faster compile times and simpler code