Inside Ohm's PEG-to-Wasm compiler

https://lobste.rs/rss Hits: 25
Summary

A few weeks ago, we announced the Ohm v18 beta, which involved a complete rewrite of the core parsing engine. Since then, we've implemented even more performance improvements: v18 is now more than 50x faster for real-world grammars while using about 10% of the memory.The new parsing engine works by compiling an Ohm grammar — which is a form of parsing expression grammars, or PEG — into a WebAssembly module that implements a parser. In this post, we'll dive into the technical details of how that works, and talk about some of the optimizations that made it even faster.PExpr trees​In previous versions of Ohm (up to and including v17), the parsing engine used an approach called AST interpretation. We'll briefly explain how that works; it will be useful for understanding how the WebAssembly version is different.When you instantiate a grammar with Ohm, it parses your grammar and converts it to an abstract syntax tree. You can think of this tree as a kind of program, which describes a parser for the language. The nodes of the tree are parsing expressions, or PExprs as they're called in the source code.We'll use the following grammar as an example:JSONLike { Value = Object | "true" | "false" | "null" Object = "{" Members? "}" Members = Member ("," Member)* Member = string ":" Value string = "\"" (~"\"" any)* "\""}This grammar is parsed by Ohm (using the "grammar grammar", or metagrammar, defined in ohm-grammar.ohm). The result is a Map<string, PExpr>:{ 'Value' => Alt( Apply('Object'), Term('true'), Term('false'), Term('null') ), 'Object' => Seq( Term('{'), Opt(Apply('Members')), Term('}') ), ...}You can think of each rule ('Value', 'Object', etc.) as being like a function, and the function bodies are parsing expressions. Alt, Apply, Opt, Seq, and Term are all subclasses of the abstract PExpr class, and they all have an eval method. These methods are mostly pretty small and straightforward — here's the implementation for Alt:pexprs.Alt.prototype.eval = function (state) { for...

First seen: 2026-03-24 16:33

Last seen: 2026-03-25 16:51