How to train your Pony: Introduction

This article is the first in a series of blog posts about the Pony language’s type system, with a focus on the capability system.

Pony

Pony is a programming language which aims at making it easier to write concurrent programs. Among other things, the language promises that any program which successfully compiles cannot experience data races. The goal of this series of blog posts is to explore how it achieves this, by designing our own miniature version of the language and “training” it until it becomes (almost) as powerful as the full Pony language.

In these posts I’ll assume some familiarity with Pony. If you are new to it, I recommend you check out the tutorial from the Pony website first.

Data races

Data-races happen when two actors access the same object concurrently, and at least one of them is modifying it. The behaviour when a data race occurs is undefined, and can lead to all sorts of corruption.

If you are not familiar with actors just think of them as threads. There are some interesting differences between the two, but for the purpose of these blog posts it doesn’t matter.

Consider the simple example below. If two actors try to call the increment method on the same Counter object at the same time, it is possible that both actors read the old value, both perform the addition on that value and both write back the same result. Overall the counter would have been incremented by one, even though the method was called twice.

class Counter
  var value: U64 = 0
  fun ref increment() =>
    value = value + 1

Actually, things could be even worse than simply incrementing once instead of twice. Under the C memory model, which Pony shares, the behaviour is undefined and the resulting value could be the old one, the old value incremented once, the old value incremented twice or completely garbage. It could even have side effects beyond that object’s value.

Note that atomic operations allow safe concurrent modifications, but I won’t consider these here.

Preventing data races

In languages like C or Java, it is up to the programmer to make sure data races cannot happen, by carefully protecting accesses to shared data. In large programs, this can be a very challenging tasks and it is easy to forget about it and make mistakes. Data-races are inherently unpredictable and often hard to reproduce, making it an ever harder task to debug.

A trend in recent languages, such as Pony and Rust, is to have the compiler detect risks of data races statically and throw a compile-time error. Thanks to this you are guaranteed that a program which compiles cannot experience any data race at run-time.

However, catching data races at compile time has a big impact on the design of the language. This series of blog posts is going to cover how do we go from our goal of “programs which compile cannot cause data races” to the design of Pony’s reference capabilities.

We’ll start with a small set of very restrictive rules, and slowly build up on them until we reach Pony’s rules. Along the way we’ll prove that these rules indeed give us the guarantee that we are looking for, at first informally and maybe later touch a bit on the formal and computer verified proofs.

There’s quite a lot to do there, so let’s get started!

My Little Pony

Before we even start thinking about data races we should agree on what programming language we’re talking about, and what its semantics are.

The Pony language as implemented by the compiler is quite complicated for our purpose, with most features being irrelevant to what we want to achieve here. Instead we’ll be using our own very minimal version of it, Little Pony.

The best thing about the Pony language is definitely the infinite number of available GIFs.

Our initial version of Little Pony is going to be very simple. The language only has one object type, Counter with builtin methods increment, decrement and is_zero which work as you’d expect. Additionally it supports booleans and simple control flow structures, such as if/else and while blocks.

var x = Counter() // Create a new counter object, assign it to variable `x`
x.increment()
if x.is_zero() then
  x.increment()
end

The language supports aliasing. When making an assignment such as y = x, the object pointed at by x is not copied, only the reference to it is. The variable y starts pointing at the same object.

var x = Counter()
var y = x
y.increment()

// This is false, x and y point to the same object,
// which was incremented just before.
x.is_zero()

Finally, our language supports user-defined actors with a number of behaviours which get executed when a message is received. However unlike in the full Pony language, there is exactly one instance per defined actor type. Messages can therefore be sent to actors by simply mentioning the actor’s type and the behaviour’s name, and optionally some arguments. For example, Foo.bar(x) sends a bar message to the actor of type Foo with x as an argument.

// Reset as a service
actor Reset
  be reset(x: Counter) =>
    // Keep decrementing the counter until it reaches zero
    while not x.is_zero()
    then x.decrement()
    end

actor Main
  // Program starts here
  be main() =>
    var x = Counter()
    x.increment()
    Reset.reset(x) // Send a message to the `Reset` actor

Note that messages are sent asynchronously. In the program above, after sending the reset message, the main behaviour keeps executing without waiting for a response. Since actors are executed concurrently, reset may start executing before main has completed.

In the program below, I’ve added a second increment call in main. This call could execute while Reset is accessing the same counter object, causing a data race.

actor Main
  be main() =>
    var x = Counter()
    x.increment()
    Reset.reset(x)
    x.increment() // Data race here !

Of course we could have been more careful when writing the program and avoid this problem, but this places a heavy burden on the programmer. Instead our goal will be to define some rules that programs must follow and which can be checked statically by a compiler. These rules will guarantee that the program is free of data races.

Well-formed heaps

The first step in making our language free of data races is define a property about a program’s runtime state. The runtime state is called the heap, and the property about it well-formedness.

We want to choose a definition of well-formedness such that at a given point in time, if the heap is well-formed then no data race can happen. We will later extend this to all points in time, implying that a data race can never happen.

Examples

So how do we decide if a heap is well-formed or not? Well let’s start by looking at two examples, one heap where a data-race might happen and one where they can’t. I use diagrams to help describe the actors and objects present in a heap, using arrows from actors to objects to represent variables.

Red heaps are bad.

This first example is the kind of heap we want to prevent. Both actors have access to the same object, which could cause a data race if they both start accessing it simultaneously.

Green heaps are good.

This second heap however is perfectly fine. The two actors have access to a distinct set of objects. Even though Actor 1 can access Object 1 through two different variables, there’s no risk of data race as it can only use one of these at a time.

Notation and Definition

From these examples we can draw our first definition of well-formedness, but first let’s introduce a few notations. We’ll use the letters $\chi$ (chi) to represent a heap, $\alpha$ (alpha) for actors and $\iota$ (iota) for objects. Finally we use $\chi, \alpha \vdash \iota$ to denote that in heap $\chi$ , the actor $\alpha$ has access to object $\iota$ .

Using these notations, the definition of well-formedness we’ll use is the following:

A heap $\chi$ is well formed if and only if for all actors $\alpha_1$ and $\alpha_2$ in $\chi$ such that $\alpha_1 \neq \alpha_2$ , there does not exist an object $\iota$ such that $\chi, \alpha_1 \vdash \iota$ and $\chi, \alpha_2 \vdash \iota$ .

You can see how this achieves our goal of preventing data-races. If no object is accessible from two different actors, then there is nothing to worry about.

Preservation of Well-Formedness

We’ve now etablished what it means for a heap to be well-formed, and we can see that with a well-formed heap no data race can happen. However we still have to show that our heaps actually are well-formed.

For this, we show that our initial heap, when the program starts, is well-formed and that every execution step after that preserves its well-formedness. Our program starts with a number of actors but no objects, which makes the initial heap trivially well-formed.

The second part, proving preservation, requires more work. We need to consider every possible execution step and look at how it modifies the heap.

Uninteresting cases

There are a few operations in our language which do not affect the shape of our heaps. For example reading a counter’s value and control flow operations if and while don’t modify the heap at all, so they obviously preserve its well-formedness.

Modifying Counter objects using the increment or decrement method modifies the heap by changing the counter’s value, but the heap’s shape itself is untouched. No actor starts seeing a new object it didn’t see before, and well-formedness is preserved.

Object creation

var x = Counter()

When an actor creates an object, it starts having access to it. However, as this object is brand new, no other actor can have access to it yet. Since all other objects are untouched, well-formedness is preserved.

The dashed line represents a newly created reference.

Aliasing

// x points to an object
var y = x

By aliasing a reference, we create a new one that points to the same object. This is allowed by our definition of well-formed heaps, since both references come from the same actor.

We’ve determined that the reference x we’re aliasing from isn’t an issue, but we also need to consider existing references other than x as well.

We use the fact that the heap was well-formed before we created the alias. By the definition of well-formed heaps, no other actor had access to that object. While there could be another reference z to that object, it must come from the same actor as x, and therefore y. The resulting heap is still well-formed.

Message passing

// x points to an object
Foo.bar(x)

actor Foo
  be bar(y: Counter) => ...

The last operation we need to consider is message passing. This is actually two operations, the sending actor writes the message to a queue and the receiving one later reads the message from the queue and executes the corresponding behaviour. However for the sake of simplicity we’ll look at this as a single combined one (this way we don’t have to model the message queue).

When we pass the message with an argument, the receiving actor starts having access to the object this argument references. Because the sending actor also has a reference to it, there are now two actors which can access the same object, breaking well-formedness!

We could be tempted to fix this by modifying our language’s semantics such that sending a message consumes the argument. In other words, doing Foo.bar(x) “kills” the x variable and any attempt to reuse it results in a compile time error. This way the sender loses access to the object and well-formedness would be preserved.

Foo.bar(x)
x.increment() // error: x has been consumed

Unfortunately, the well-formedness of the original heap does not guarantee that x is the only reference to that object. Indeed, there could be another alias z which would persist even if we made x unusable.

For the moment we’ll simply disallow sending messages with arguments. It makes the language a lot less powerful by heavily limiting how actors can communicate, but at least well-formedness is preserved.

Foo.bar(x) // error: cannot send messages with arguments
Foo.baz() // ok

Conclusion

Thanks to our definition of well-formed heaps, and by showing that it gets preserved during any of our execution steps, any program written in Little Pony is guaranteed to be free of data-races!

The flip-side is that we’ve had to disallow message passing with arguments. This severly limits what kind of program you can write, making it unusable for most practical purposes.

In the next post I’ll talk about how we can improve our type system to fix this, by introducing immutable objects. Stay tuned!

If you have any questions or comments feel free to email me at paul@lietar.net. Alternatively you can share them on twitter or reddit.