On speaking to a thread... and Erlang/OTP

In his biography, Winston Churchill writes about his early encounter with Latin while in school. The young student is puzzled by why mensa (table) also means "O table". His schoolmaster explains that "O table" is the vocative form: "You would use it in speaking to a table", to which the puzzled student blurts out: "But I never do", and is warned for his impertinence.

I feel the same way while programming with threads, my use of the vocative form "O thread" being more of a plea to the thread. Ideally I would like to avoid thread programming, yet write concurrent code for multi-core processors. In this blog post I'd like to give you some idea of how Erlang helps me to do that. I also hope to show how Erlang creates the least my astonishment, referring of course to myself as an average, even mediocre, programmer.

Not factorial, not again!

There must be a rule against using factorial programs while discussing Erlang. Until such time:

factorial(0) -> 1;
factorial(N) -> N*factorial(N-1).

Even if you don't know Erlang this code is clear. But note that if N is a negative integer, or a float, the program will recurse infinitely (Erlang handles integers without fussing about the size). The program may even hang up your machine, that's why it's called re-curse.

We could check for these conditions with a guard, but what happens if we didn't realize this weakness in the code? Then we'll find out only when someone hangs up the machine with a call to factorial(-5). This example may appear trivial but the pattern is representative of other programs where it is difficult to identify the corner cases.


Separating behaviour from content

A more generic approach for handling such bugs is to separate the behaviour from content. As a simple behaviour, we'd like to stop the function if it's taking too long. This is different from a plain timeout where the call to the function times out, here we want to kill the function execution upon timeout.

This business of separating behaviour from content is somewhat like css/html in web pages - css provides styling, which is kept separate from the html content. So let's write a factorial server in Erlang, and keep the server's behaviour separate from its content. We'll start with a picture of the server pattern:


Note there's nothing Erlang specific here, this pattern can be written in several languages. We spawn a separate thread for the server and make calls to it. If factorial(-5) takes too long we timeout and issue a 'kill' to the thread.

One could use a thread library to code this, or drop down to the OS and issue thread signals. In a more complex program we may need to save the state of a rogue thread before killing it, and then restart the server with the saved state. While all this might be a breeze for an expert programmer in C or Java, the prospect of writing fragile threaded code is very daunting to me.

"Erlang has an OS-feel to it"

There is a statement attributed to Robert Virding, one of the creators of Erlang, that Erlang has an OS-feel to it.

This is a profound observation and it means a lot of things. The way I interpret it is that Erlang does very well to hide the complexity of OS processes and threads, instead offering a simplified system of lightweight processes, messages, and signals. We still need to say "O Erlang process", but the grammar is simpler while the semantics have been proven by years of use in Erlang systems.

Here's my code for the factorial server, in Erlang:

-export([start/0, stop/0, fac/1]).
-define(SERVER, ?MODULE).

%% Behaviour
start() ->
    register(?SERVER, spawn(fun() -> loop() end)).

fac(N) ->
    call({get_factorial, N}).

stop() ->

call(Request) ->
    ?SERVER ! {self(), Request},
        Response -> Response
        1000 ->
            exit(whereis(?SERVER), kill),
            {timeout, server_killed}

%% Content
loop() ->
        {From, exit} ->
            From ! {?SERVER, exiting},
        {From, {get_factorial, N}} ->
            From ! {ok, factorial(N)}

%% If run without supervision this factorial will
%%  infinitely recurse for -ve or non-integer N
factorial(0) -> 1;
factorial(N) -> N*factorial(N-1).

And here's how it works in the Erlang shell. The 1> etc are shell prompts, while the period "." at the end of a 'command' actually terminates an Erlang expression:

1> fac_serv:start().
2> fac_serv:fac(10).
3> fac_serv:fac(-10).

Erlang and OTP

Implementing a timeout for a server is not exactly the greatest of code accomplishments. Erlang can do a lot more by way of implementing behaviours for servers, finite state machines, events, and supervisors.

The canonical Erlang way to implement behaviours is what's called the OTP (Open Telecom Platform) framework. Note that using OTP is not mandatory in order to write Erlang code. However, using OTP provides two major benefits: (i) OTP has been battle-tested and proven in numerous large scale production systems, so it saves us the trouble of finding out subtle bugs that are beyond the scope of our imagination, and (ii) in a group working on a large project, OTP creates a standard programming style.

I like to think of OTP as a set of "standard openings" in chess. While one can play chess freestyle, the use of a standard opening saves us from blunders - both of the "Fool's Mate" type as well as the more subtle mistakes. OTP has been figured out for us by the Grandmasters of Erlang, so we can do well by just following the recipe... though we still need to think for ourselves as the middle game develops.

By the same token one could also properly learn thread-based distributed programming in C. That's a personal choice, but then one could also construe the vocative form of mensa, "O table".