Introducing Cluster‑wide Coordination in Elixir

When you move from a single BEAM node to a fleet of nodes, the way processes talk to each other has to change. This article shows, step by step, how to turn a simple local application into a fault‑tolerant cluster without sacrificing the familiar GenServer abstraction.

Why Coordination Matters Across Nodes

In a single node you can safely hand a large value to a central process that serialises access to a shared resource. The call travels inside the node, so the cost is just the copy made in the VM’s memory. Across a network, however, that same copy has to be turned into a binary, shipped over TCP, and reconstructed on the other side. If the payload is a megabyte‑sized image, a video chunk, or a massive configuration map, the network round‑trip dominates the latency.

One way to avoid moving the data is to let the *caller* keep the heavy payload and only ask the cluster to grant an exclusive lock. That lock is tiny – just a token – but it guarantees that only one node may be doing the expensive work at a time.

Cluster‑wide Locks with `:global.trans/2`

Elixir’s :global module supplies a simple distributed lock primitive. The function :global.trans/2 takes a unique name (the lock) and a function to run while the lock is held. The function is executed in the calling process, i.e. the data never leaves the node that owns it.


defmodule ImageProcessor do
  # The caller passes a potentially huge bitmap.  We lock the whole
  # cluster while we work on it, but the bitmap itself never leaves.
  def transcode(large_image) do
    :global.trans({:image_transcode, self()}, fn ->
      # The heavy CPU work happens here, locally.
      do_transcode(large_image)
    end)
  end

  defp do_transcode(image), do: :ok # placeholder for real work
end

The lock name {:image_transcode, self()} is unique per request, ensuring that at most one node in the entire cluster can be inside do_transcode/1 for a given image at any moment. The network chatter is limited to a few bytes exchanged to acquire and release the lock.

Designing a Minimal Fault‑Tolerant Cluster

Let’s imagine a small service that stores inventory records. Clients can add items, read them, and delete them. We want a few simple guarantees:

All nodes run the same code and expose the same HTTP API.
Whenever a record is created on one node, the change is visible on every other node.
If a node crashes, the cluster continues serving requests, and no data is lost.

These goals match the classic definition of a fault‑tolerant system: availability (the service stays up) and consistency (all nodes agree on the data).

From Local to Global Process Registry

Inside a single node we often use Registry to map a friendly name (e.g. :my_worker) to a PID. Registry works fine locally but cannot be queried from another node. To make a process discoverable across the whole cluster we swap Registry for :global registration.

Below is a re‑implementation of a Inventory.Server that registers itself globally:


defmodule Inventory.Server do
  use GenServer

  # Public API -------------------------------------------------------------
  def start_link(name) do
    GenServer.start_link(__MODULE__, %{}, name: global_name(name))
  end

  def add_item(name, item) do
    GenServer.call(global_name(name), {:add, item})
  end

  def list_items(name) do
    GenServer.call(global_name(name), :list)
  end

  # -----------------------------------------------------------------------
  @impl true
  def init(state), do: {:ok, state}

  @impl true
  def handle_call({:add, item}, _from, state) do
    {:reply, :ok, Map.update(state, :items, [item], &[item | &1])}
  end

  def handle_call(:list, _from, state) do
    {:reply, Map.get(state, :items, []), state}
  end

  # Helper that builds the global tuple expected by :global
  defp global_name(name), do: {:global, {__MODULE__, name}}
end

All calls route through global_name/1. The tuple {:global, {Inventory.Server, name}} is a cluster‑wide identifier that points to the exact process regardless of which node you are on.

Looking Up Servers Efficiently

Registering a process globally forces the :global manager to synchronise its internal ETS table across all nodes. That synchronisation is cheap for a single lookup, but if we invoke it on every request then we end up with a lot of unnecessary chatter.

A common optimisation is a two‑step lookup:

Ask :global.whereis_name/1 whether the PID already exists.
If it does, use it directly; otherwise, start a new server.

Here’s an Inventory.Cache that embodies that pattern:


defmodule Inventory.Cache do
  @moduledoc false

  # Returns the PID that owns the inventory for `list_name`.
  # If no process exists yet, we spawn one under the DynamicSupervisor.
  def server_process(list_name) do
    existing_process(list_name) || start_new_process(list_name)
  end

  defp existing_process(list_name) do
    case :global.whereis_name({Inventory.Server, list_name}) do
      :undefined -> nil
      pid -> pid
    end
  end

  defp start_new_process(list_name) do
    case DynamicSupervisor.start_child(
           __MODULE__,
           {Inventory.Server, list_name}
         ) do
      {:ok, pid} -> pid
      {:error, {:already_started, pid}} -> pid
    end
  end
end

The call to :global.whereis_name/1 is a local ETS lookup – it does not involve any network traffic. Only when the process truly does not exist do we incur the cost of a cluster‑wide registration.

Alternative Discovery: Hash‑Based Node Selection

Even with the lazy‑lookup optimisation, every new inventory list still forces a global registration, which is a serialized operation. For very large clusters this can become a bottleneck.

One way to avoid the global lock entirely is to decide ahead of time which node will “own” a given list. A deterministic hash function can map a list name to a node, guaranteeing that all clients agree on the owner.


defmodule Inventory.NodePicker do
  @doc """
  Returns the node that should host the given `list_name`.
  Uses a simple deterministic hash on the sorted node list.
  """
  def for_list(list_name) do
    nodes = Node.list([:this, :visible]) |> Enum.sort()
    index = :erlang.phash2(list_name, length(nodes))
    Enum.at(nodes, index)
  end
end

Now, instead of asking :global for a lock, we forward the request straight to the chosen node via :rpc.call/4:


defmodule Inventory.DistributedCache do
  def server_process(list_name) do
    target_node = Inventory.NodePicker.for_list(list_name)

    # Remote call returns the PID on the target node.
    :rpc.call(target_node, Inventory.Cache, :server_process, [list_name])
  end
end

This approach eliminates the global registration step entirely, but it introduces a new set of challenges:

Node churn: Adding or removing nodes changes the hash mapping, so the same list may migrate to a different node.
Data migration: When a list’s owner changes, the data must be moved without downtime.
Consistency guarantees: The hash must be consistent – usually achieved with consistent hashing – to keep movements minimal.

For many small‑to‑medium services the global registration route is acceptable because it’s simple and correct. When scale demands it, libraries such as Syn or Swarm already implement these ideas.

Replicating State Across the Cluster

Even with a perfectly coordinated process registry, a single node crash wipes the data stored in that node’s memory. To survive crashes we must replicate the state.

Simple “write‑everywhere” Replication

One naive but robust strategy is to broadcast every write operation to every node. The assumption is that the cluster is small enough that the extra network traffic is tolerable, and the simplicity outweighs the cost.

We start with a Inventory.Database module that knows how to store a key/value pair locally. Then we add a thin wrapper that spreads the call to the whole cluster using :rpc.multicall/4.


defmodule Inventory.Database do
  @moduledoc false

  # Store locally – the “real” implementation.
  def store_local(key, value) do
    :ets.insert(:inventory_table, {key, value})
    :ok
  end

  # Public API – writes everywhere.
  def store(key, value) do
    nodes = Node.list([:this, :visible])
    # Fire the same call on every node, including the caller.
    :rpc.multicall(nodes, __MODULE__, :store_local, [key, value])
    :ok
  end

  # Reads are cheap – we just look at the local copy.
  def get(key) do
    case :ets.lookup(:inventory_table, key) do
      [{^key, value}] -> {:ok, value}
      [] -> :error
    end
  end

  # Helper to ensure the table exists on every node.
  def init! do
    :ets.new(:inventory_table, [:named_table, :public, :set])
  end
end

Key points of this design:

The write path contacts all nodes via :rpc.multicall. If one node is down, the call still succeeds on the rest, and the caller receives a list of results indicating which nodes responded.
The read path assumes eventual consistency: we simply read from the local ETS table, trusting that recent writes have already been replicated.
Because :rpc uses a call (not a cast), the writer receives an acknowledgment from each node, allowing us to react if a node fails to store the data.

Limitations and Future Directions

The “write‑everywhere” scheme works for small clusters and simple data models but has drawbacks:

Network overhead: Every write multiplies by the number of nodes.
Consistency window: If a node goes down after receiving the write but before persisting it, its copy may be stale.
Scalability: As the key space grows, replicating the entire table on every node becomes memory‑intensive.

More sophisticated approaches include:

Leader‑based replication: One node acts as primary, replicating log entries to followers (Raft, etc.).
Sharding with replication factor: Each key lives on a subset of nodes (e.g., three out of five), reducing storage while preserving fault tolerance.
CRDTs (Conflict‑Free Replicated Data Types): Allow concurrent updates without coordination, merging automatically.

Libraries such as Swarm or Syn already implement many of these patterns, letting you focus on business logic rather than low‑level clustering code.

Putting It All Together – A Mini‑Project Walkthrough

Below is a quick “starter kit” that combines the pieces we’ve discussed. The goal is not to be production‑ready but to illustrate how the concepts intertwine.


# ----------------------------------------------------------------------
# 1. Application entry point – starts the supervision tree on each node
# ----------------------------------------------------------------------
defmodule Inventory.Application do
  use Application

  def start(_type, _args) do
    children = [
      {DynamicSupervisor, name: Inventory.Cache, strategy: :one_for_one},
      {Task, fn -> Inventory.Database.init!() end}
    ]

    opts = [strategy: :one_for_one, name: Inventory.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

# ----------------------------------------------------------------------
# 2. API façade – a thin HTTP layer (e.g. Plug) would call these functions
# ----------------------------------------------------------------------
defmodule Inventory.API do
  # Public entry points used by controllers / routers
  def add_item(list_name, item) do
    pid = Inventory.Cache.server_process(list_name)
    Inventory.Server.add_item(pid, item)
  end

  def list_items(list_name) do
    pid = Inventory.Cache.server_process(list_name)
    Inventory.Server.list_items(pid)
  end

  # Demonstrates persistence: each add also writes to the replicated DB
  def persist_item(item_id, data) do
    Inventory.Database.store(item_id, data)
  end

  def fetch_item(item_id), do: Inventory.Database.get(item_id)
end
# ----------------------------------------------------------------------
# 3. Example usage from an IEx session
# ----------------------------------------------------------------------
# iex> Inventory.API.add_item(:warehouse_a, %{sku: "ABC123", qty: 10})
# :ok
# iex> Inventory.API.list_items(:warehouse_a)
# [%{sku: "ABC123", qty: 10}]
# iex> Inventory.API.persist_item(:abc123, %{price: 12.5, location: "A1"})
# :ok
# iex> Inventory.API.fetch_item(:abc123)
# {:ok, %{price: 12.5, location: "A1"}}

The flow is:

Discovery: Inventory.Cache.server_process/1 either finds an existing Inventory.Server or spawns one, using the lazy global lookup.
Serialization: All calls to a particular list go through the same GenServer, guaranteeing that mutations are ordered.
Replication: Every call to Inventory.Database.store/2 propagates the data to every node, protecting against node loss.

Common Pitfalls and How to Dodge Them

Forgetting to start :global on all nodes: If a node boots without the :global application, global registration will silently fail. Ensure :global.start/0 is part of your application start‑up.
Lock granularity too coarse: Locking an entire resource (e.g. “process any image”) can become a bottleneck. Consider sharding the lock (e.g. per file hash prefix) or using a read‑write lock.
Ignoring network partitions: Our examples assume a well‑behaved network. In real deployments, partitions may cause “split brain”. Tools like pg2 groups or external consensus services (etcd, Consul) help detect partitions.
Unbounded ETS tables: Replicating data to every node inflates memory usage. Periodically prune or archive stale keys, or move to a disc‑backed store (Mnesia, RocksDB) for large datasets.
Assuming calls always succeed: :rpc.multicall returns a list of results; some nodes may have timed out. Handle the error case to avoid silent data loss.

Summary – Key Takeaways

Use :global.trans/2 for cheap cluster‑wide exclusive access when you must keep a large payload on the caller.
Replace local Registry with :global registration to make a process discoverable across nodes.
Optimize look‑ups by first checking :global.whereis_name/1 before spawning new processes.
When scalability demands it, consider deterministic hash‑based node assignment to avoid global registration contention.
Simple “write‑everywhere” replication provides fault tolerance for small clusters; more advanced patterns exist for larger systems.
Always keep an eye on the trade‑off between consistency, availability, and network overhead – the classic CAP considerations.

Armed with these primitives you can now stitch together robust, distributed services in Elixir without abandoning the beloved GenServer model. The next step is to experiment: add real networked clients, simulate node failures, and watch how your cluster gracefully keeps serving data.