Implementing a Pratt parser in Rust

Following on from the previous tutorial that demonstrated how to implement a tokenizer, it is now time to implement a parser to transform the list of tokens into an Abstract Syntax Tree, hereafter referred to as an AST.

We will be using a Pratt parser design, which is a recursive descent parser that is very simple to reason with. I discovered this design a few years ago (largely thanks to this blog post) and I have been using it extensively since then in a number of programming languages. It works very nicely with functional languages such as Rust and Scala.

Following the theme of the previous tutorial we will be parsing simple math expressions such as 1+2*3. We will need to handle operator precedence correctly to interpret 1+2*3 as 1+(2*3) rather than (1+2)*3.

The Pratt parser uses the concepts of a prefix parser and in infix parser. A prefix parser is responsible for parsing the start of a statement. In our case that would be an integer. The infix parser is responsible for parsing an infix operator (in our case the infix operator would be one of the math operators: +, – , *, or /).

Before we dive into the main logic, we need some data structures to store the parsed expressions:

We’ll also need some logic for determining the precedence of the different operators. Note that this code operates on the Symbol token rather than the Op enum that will contain the parsed operator.

The top level parsing code is actually very simple to follow. First the prefix parser is used to parse a prefix expression. Then the code loops until the precedence changes, parsing the next infix operator. We’ll see where the recursion comes in shortly when we write the parse_infix method.

The parse_prefix method is very simple for our use case. There is only one supported prefix token and that is an integer.

The infix parser is slightly more involved. It gets passed the “left expression” and is then responsible for parsing the infix operator and then recursing back into the top level parse_expr method to parse the right-hand side of the binary expression.

To make the logic clearer, let’s walk through two different scenarios and show what is happening with debug output in each function.

First, let’s parse 123+456*789.

Now let’s swap the operators so we are now parsing 123*456+789.

We can see from the debug output that the logic is determined the operator precedence correctly for both of these use cases.

The full source code for this example is available from github and I encourage you to try this code out yourself to get a deeper understanding of how this works.

Leave a Reply

Your email address will not be published. Required fields are marked *