Polymorphism in Rust: Enum vs Trait + Struct

Rust isn’t an object-oriented language in the same way as C++ and Java. You cannot write classes that extend behavior from another class for example. However, Rust does support polymorphism and there are two common approaches.

For this blog post I’m going to use an example of writing a crate for generating svg (scalable vector graphics) files based on a list of shapes. Ideally, it should be possible for users of this library to add their own shapes.

The two approaches I am going to demonstrate are algebraic data types (enumerations) and structs combined with traits.

Polymorphism using Enumerations

Let’s look at enumerations first. We can define an enumeration to model all of our shapes like this (full source code for this example is available here).

It is now possible to write a function that can generate an svg file given a vector of shapes.The function can use pattern matching to determine the behavior for each type of shape.

This looks pretty clean, but a user of this crate cannot add their own shapes since enumerations do not support inheritance and there is no other mechanism to add a new shape to the existing enumeration.

Polymorphism using Traits and Structs

The second approach is to use a trait to define some behavior and use structs to represent the shapes. This time we define a separate struct for each shape like this (full source code for this example is available here).

We also define a trait. Rather than call the trait ‘Shape’ we will call it ‘SvgWriter’ since we are using the trait to define behavior rather than type.

Next, we need to provide an implementation of this trait on each of our shapes.

Finally, we can write our function to generate the svg file given a vector of shapes.

The advantage of this approach over enumerations is that a user of this crate can go ahead and define new structs to implement custom shapes. For example, the user could create a shape that combines a square and a circle:

The svg writer can now be invoked with this custom shape as follows. Note that because we’re using traits, we cannot simply add structs to the vector because one of the requirements of a vector is that all of its elements must be the same size on the stack and traits do not have a fixed size (the data types implementing the trait have a size and there could be multiple types with different sizes). To work around these, we put the data types that implement the trait into a “box”. This is effectively a fixed size pointer to the data type (it’s actually a “fat pointer” containing a pointer and a v-table, but that’s beyond the scope of this blog post). The boxed trait is often referred to as a “trait object”.

Mixing and Matching Enums, Traits, and Structs

It is possible to mix these two approaches by adding an enum variant to model a user-defined shape and have that contain a trait object, as follows:

I have used this approach before but I think it is generally best to stick to one approach or the other for a more consistent coding style and reduced complexity.

Conclusion

Enumerations make for very readable code with pattern matching but are not extensible. They make sense for internal structures that users of your crate do not need to add functionality to. Enumeration values can be passed around efficiently on the stack as well.

Traits provide more flexibility at the cost of placing objects on the heap and using dynamic dispatch instead of static dispatch. This cost is unlikely to be an issue for the majority of applications and is a price worth paying for the additional flexibility, in my opinion.

Implementing an object factory pattern in Rust using closures

Coming from a Java background, I’m used to using the object factory pattern to inject behavior into an existing component. For this tutorial we’ll use the example of building a connection pool that can be used with different types of database connections.

Disclaimer: The entire approach taken here might not be the best way of doing things in Rust at all, but I made it work with some help from the Rust IRC channel (shout out to Quxxy!). If you are a Rust guru and know of a more idiomatic way of solving this problem I would love to hear about it!

Anyway, without further ado, here is our naive connection pool class. Oops, did I just say “class”? I meant “struct”. It’s hard to undo so much Java.

This struct uses generics so we could create instances of this connection pool using different implementations of T (the connection type) although F must always be a function that has no arguments and returns a T.

The struct stores the F closure for creating new connections and has a Vec<T> to hold the connections.

Next, we implement some methods on this struct.

Ignoring the dubious logic in here (luckily, we’re not trying to build a real connection pool) the main point of interest is at line 10 where we create new connections by invoking the closure using the syntax (self.create_conn)().

Now that the generic connection pool is defined, we can go ahead and define our concrete connection struct that we want to store in the pool.

And finally, we can implement some code to create and use the connection pool.

For the closure that we pass to the ConnectionPool::new() method, we are just passing a reference to a function. It would have been possible to use closure syntax instead e.g. ConnectionPool::new(|| OracleConnection {}) but using the function is much cleaner.

Source code for this tutorial is available here.

Implementing a Pratt parser in Rust

Following on from the previous tutorial that demonstrated how to implement a tokenizer, it is now time to implement a parser to transform the list of tokens into an Abstract Syntax Tree, hereafter referred to as an AST.

We will be using a Pratt parser design, which is a recursive descent parser that is very simple to reason with. I discovered this design a few years ago (largely thanks to this blog post) and I have been using it extensively since then in a number of programming languages. It works very nicely with functional languages such as Rust and Scala.

Following the theme of the previous tutorial we will be parsing simple math expressions such as 1+2*3. We will need to handle operator precedence correctly to interpret 1+2*3 as 1+(2*3) rather than (1+2)*3.

The Pratt parser uses the concepts of a prefix parser and in infix parser. A prefix parser is responsible for parsing the start of a statement. In our case that would be an integer. The infix parser is responsible for parsing an infix operator (in our case the infix operator would be one of the math operators: +, – , *, or /).

Before we dive into the main logic, we need some data structures to store the parsed expressions:

We’ll also need some logic for determining the precedence of the different operators. Note that this code operates on the Symbol token rather than the Op enum that will contain the parsed operator.

The top level parsing code is actually very simple to follow. First the prefix parser is used to parse a prefix expression. Then the code loops until the precedence changes, parsing the next infix operator. We’ll see where the recursion comes in shortly when we write the parse_infix method.

The parse_prefix method is very simple for our use case. There is only one supported prefix token and that is an integer.

The infix parser is slightly more involved. It gets passed the “left expression” and is then responsible for parsing the infix operator and then recursing back into the top level parse_expr method to parse the right-hand side of the binary expression.

To make the logic clearer, let’s walk through two different scenarios and show what is happening with debug output in each function.

First, let’s parse 123+456*789.

Now let’s swap the operators so we are now parsing 123*456+789.

We can see from the debug output that the logic is determined the operator precedence correctly for both of these use cases.

The full source code for this example is available from github and I encourage you to try this code out yourself to get a deeper understanding of how this works.

https://github.com/KeepCalmAndLearnRust/pratt-parser

Writing a Tokenizer with Iterator and Peekable

In this tutorial we will write a low-level tokenizer for tokenizing simple math expressions like “12+34*56” using Rust’s Iterator and Peekable traits.

First, we need to declare some data structures for storing the results. Rust only has two types of data structure – struct and enum. If you’re coming from a Java or Scala background you would possibly be wanting classes with an object-oriented hierarchy, but enums are the way to go here.

Rust enumerations are a great way of modeling data structures and can be nested and recursive with the help of Box and Vec. However, in this example we will keep things simple and avoid recursion. We’ll tackle that in part two of this tutorial where we will build a parser for math expressions.

We will use the #[derive(…)] syntax to implement the Debug and PartialEq traits for these enumerations to make it easy to write unit tests.

We now need a tokenize function that can take a String as input and produce a Vec as output. There are two main approaches for doing this in Rust – we can write a standalone function that accepts a String parameter or we can define a trait and implement the trait directly on the String class.

The latter is generally the better approach since it ties functionality to a particular type, and if we want to be able tokenize other types in the future we can implement the trait for those types too.

Here is our Tokenizer trait definition:

We now use the following syntax to implement the trait for the String type:

We can get an iterator over the chars in the string by calling the chars() method, but we want the ability to peek at the next character without consuming it, so we call peekable() on the iterator to give us a Peekable trait.

We also need to prepare a vector to store the tokens:

Next, we need to loop until all characters have been processed. Here’s the starting code for this.

There are some important things to explain here. Firstly, the loop keyword is equivalent to while(true) in other languages and will loop until hitting a break or return statement (or a panic).

Next, we’re performing pattern matching on the result of it.peek(). Because this method returns an Option<&T> (an optional reference, rather than an optional value) we need to match with Some(&ch) rather than Some(ch) otherwise we start running into lifetime issues. If peek() returns Some(&ch) then we can start building a specific token and if it peek() returns None we can quit the loop since there are no characters left to process.

Here is the full implementation of the tokenize() function:

A design goal here is to keep the tokenizing code as functional as possible, so we make use of a function named my_take_while to iterate over the string collecting characters for as long as the supplied closure returns true. This is very similar to the take_while() method on Iterator with one major difference – our version only consumes characters that match, whereas the Iterator implementation consumes one extra character that does not match, making it unsuitable for our purposes.

When I started writing this tutorial I had hoped it would be much simpler and we’re already getting into closures and functional programming. However, if you stumbled across this because you are trying to implement a tokenizer in Rust, hopefully it will give you some good ideas.

The full source code for this example is available in github here:

https://github.com/KeepCalmAndLearnRust/pratt-parser

Part two of this tutorial is now available: http://keepcalmandlearnrust.com/2016/08/pratt-parser-in-rust/