Overview

Elixir — Where is your state?

No Comments

In this post, we will look at Elixir’s Agent abstraction, which provides a way to manage state in an Actor-based, functional programming environment. We will also look in more detail into its building blocks, implemented in Erlang/OTP. Along the way, an implementation of TicTacToe will serve as an illustrating example.

State

So what is the issue with state? In order to answer this question, let’s first talk about some of the benefits of functional programming, in particular, the concept of pure functions: Functions whose return values are only determined by their input values, without any observable side effects. We know this concept from Math: It is exactly how a mathematical function works.
As these functions never depend on any external (e.g. global) state, they are easy to test and reason about. Such functions are also safe to use in concurrent environments.

Real-world IT systems however are regularly faced with the task to maintain state, be it in some external storage system, or in memory/at runtime.
So how do we solve this task when all we can do is use pure functions? Databases are out of scope for this post. Instead, let us look at a way to maintain in-memory state at runtime, using a simple game as an example application.

TicTacToe Kata

For this post, I have implemented a solution to the TicTacToe kata. In this well known game, a 3×3 grid is used to store and visualize the game’s state. Two players (named “X” and “O”) compete to be the first to occupy three adjacent fields of the grid, which need to be arranged either in one row, one column or diagonally, in order to win the game. Avid readers will already know that there is no way to win this game, other than the opponent making a glaring mistake. I guess nobody really ever plays this game and it’s main use has probably been to serve as a caricature, but it is nevertheless a good example for the state maintenance challenge.

Solution Design

The implementation is available on github and consists of four Elixir modules, dealing with different aspects such as UI, game logic and state handling.

Model

The game state is modeled as a map, using three lists to represent the three rows of the game’s grid plus some additional fields such as the player who is to make the next move. The map is initialized for a new game in the module game_logic, function init. Please note that ‘\\’ is Elixir’s way to denote default values for optional function arguments.

  def init(next_player \\ "X",
           row_1       \\ [" ", " ", " "],
           row_2       \\ [" ", " ", " "],
           row_3       \\ [" ", " ", " "]) do 

    %{:row_1     => row_1, :row_2       => row_2,
      :row_3     => row_3, :next_player => next_player,
      :game_over => false, :winner      => nil}

  end

 
For each subsequent move, we want to update the game’s state.

But wait: Elixir maps are immutable data structures, we cannot change any of its mappings in place. This is where the Elixir Agent module comes to the rescue.

Agent

A brief look into the source code for Agent quickly reveals that it is essentially an Elixir wrapper plus convenience functions around Erlang’s gen_server. As its name implies, gen_server is a generic server implementation and a set of callback definitions provided by Erlang/OTP.

The code snippet below shows a loop function that will (in a similar form) often be a building block of servers built in Erlang. The loop function contains a receive directive, meaning that the Erlang process that runs this function will wait for an incoming message that matches one of the two given patterns {call, From, Req} or {cast, Req}. In both cases, a new state (State2) will be computed and loop will finally call itself with this new state, thereby preserving the new state for future invocations. At the same time, the implementation respects the immutability requirement since State is never being changed in place. In fact, it would be impossible to do so.

loop(Mod, State)
    receive
        {call, From, Req}
            {Res, State2} = Mod:handle_call(Req, State),
            From ! {Mod, Res},
            loop(Mod, State2);
        {cast, Req}
            State2 = Mod:handle_cast(Req, State),
            loop(Mod, State2)
    end.

 
Note that the receive directive does not lead to blocking behavior. The effect can be compared to suspending a process while Erlang could in analogy be compared with an operating system, rather than being a pure language implementation. The snippet given above is taken from an Erlang/OTP design document.

For the remainder of this post, we can imagine an Elixir Agent to implement the behavior of the loop shown above. It will wait for incoming messages (e.g. an ‘update’ request) that will provide an updated state and will then recursively call itself with the updated state, thereby suspending itself until the next matching message arrives.

After this brief excursion to Erlang, let’s get back to the Elixir Agent and its usage in the present example.

The solution makes use of four Agent functions, three of them are shown in the code snippet below.

defmodule Game do
  @moduledoc """
  Manages the game's state by use of an Elixir Agent.
  """

  @doc """
  Starts a new game.
  """
  def start_game(first_player) do
    initial_state = GameLogic.init(first_player)
    start_link(initial_state)
  end

  defp start_link(initial_state) do
    {:ok, pid} = Agent.start_link(fn -> initial_state end)
    pid
  end

  @doc """
  Executes a move for next player.
  """
  def move(game, row_num, column_num) do
    state = Agent.get(game, fn state -> state end)
    next_state = GameLogic.move(state, row_num, column_num)
    Agent.update(game, fn _state -> next_state end)
    next_state
  end
end

 
get expects a function that takes the current Agent state as its input. As its name implies, this allows us to get the current state out of the Agent.

update does what its name suggests: We use it to pass a new state into the Agent.

start_link is probably the most interesting looking function call in the snippet. It spawns a new Erlang process and links it to the calling process. This means that the lifecycles of both processes are now linked together. The spawning process will be notified of a crashing child process. So in our case, the crashing of the Agent (child) would get promoted to the spawning Game process (parent). These events can either remain unhandled, in which case the parent will crash too and its own parent will get notified, or they can be handled, for example by starting a new Agent. This would allow the player to continue playing, albeit with a new, empty game state.
The subject of resilience in the face of failures and Erlang/OTP supervision trees, as a solution to the requirement, will be introduced in an upcoming post.

stop stops the spawned Agent process.

All of the functions mentioned above have an optional timeout argument in their signatures. This reminds us of the fact that, behind the scenes, we send a message to a separate process and await its reply. An error will be raised in case the reply takes longer than the specified timeout.

Testing

The presence of Agents does not add extra complexity to testing the solution. After all, everything in Elixir is built on processes anyway.

One aspect that required additional test effort is the UI. In order to test correctness of output that would normally get printed to the console, you can spawn a StringIO process and pass it into the relevant IO.write calls. IO.write takes an optional argument ‘device’, into which you can redirect the written output. You can find examples of this technique in all tests in test/game_printer_test.exs.

While I am not a big fan of using mocks and think that in most circumstances you can avoid using them — particularly in functional programming — I still chose to make use of mocks in a small number of tests.
I encourage you to have a look at the tests in test/tictactoe_test.exs and decide for yourself whether mocking fits your style and requirements. The library used to provide the functionality is adequately named mock.

One more tool that I want to give a shout-out to is credo. It provides very useful feedback on your code. This can include refactoring opportunities, hints to complex or duplicated code, warnings about common mistakes, inconsistencies in naming schemes etc. It helps tremendously in developing a consistent style and it never fails to remind me to add documentation to my published interfaces.

Beware of process leaks!

While working on this post, I noticed one gotcha that I want to make you aware of. Here is a code snippet from lib/tictactoe.ex to explain the issue:

  defp new_game(game) do
    :ok = Agent.stop(game)
    next_game = Game.start_game("X")
    process(next_game)
  end

 
My first implementation attempt lacked the call to stop the Agent in the first line of the new_game function shown above. What this boils down to, is what you could call a process leak. The game starts a new Agent and passes its PID (process identifier) into the process function. Meanwhile, the already running Agent will simply continue to run!

As a consequence, your application will either run out of processes or out of memory, whichever limit gets hit first. Erlang processes are very small (around 350 Bytes) and the VM can run a lot of them, even on moderate hardware, so it may take a while before the VM would run into its limits. I can just imagine how much fun it would be, to debug an application that appears to work properly and then crashes after a number of days in production, so watch out for these nasty bugs in your code. Hopefully, your monitoring would at least help you spot the problem before it’s too late.

Conclusion

We have seen how Elixir Agents allow us to maintain program state at runtime. Internally, they make use of Erlang processes. These are the foundational building blocks for practically everything in Erlang and Elixir, so learning about their use is essential to making sense of any language running on the Erlang VM.

The recursive server loop pattern we looked at is in wide use in Erlang-based servers and it is useful to understand its semantics and inner workings.

On a closing note: I wish I had a good explanation for the module name ‘Agent’. Dear reader, if you know a good one, I would love to hear it!

 

Comment

Your email address will not be published. Required fields are marked *