Once your application has left the comfort of the REPL and is running on a server, the focus shifts from writing code to understanding the behavior of that code while it is alive. In a highly concurrent, distributed environment the usual “print‑and‑see” approach quickly becomes insufficient. This article walks you through the essential techniques for inspecting, debugging and profiling a production‑grade Elixir system without resorting to fragile step‑by‑step debuggers.
Why Observability Matters
- Fault tolerance is a guarantee, not a hope. Even the most carefully designed OTP trees can crash under unexpected load or data.
- Resource consumption grows unpredictably. Memory leaks, excessive reductions, or CPU‑bound loops may only surface after hours of traffic.
- Distributed nodes complicate the picture. A problem that appears on one node may be caused by a message sent from another.
Being able to answer “what went wrong?” quickly is the difference between a brief outage and a prolonged service degradation.
1. Debugging in a Concurrent World
Classic line‑by‑line debuggers assume a single thread of execution. When hundreds of processes run in parallel, stopping one of them freezes only a tiny slice of the system, leaving the rest blind to the pause. Instead of halting execution, we rely on instrumentation that records state without changing program flow.
1.1 IO.inspect/2 – A Quick‑and‑Dirty Probe
The simplest way to peek at a value is to wrap the expression in IO.inspect/2. Because it returns the original value, you can insert it anywhere without altering the surrounding pipeline.
defmodule Warehouse.Inventory do
# Computes the total number of items across all categories.
# Adding an inspect will reveal the intermediate map.
def total_counts(items) do
items
|> Enum.reduce(%{}, fn {category, qty}, acc ->
Map.update(acc, category, qty, &(&1 + qty))
end)
|> IO.inspect(label: "Inventory after aggregation")
end
end
When total_counts/1 runs, you’ll see a nicely labeled output in the console, helping you verify that the aggregation behaves as expected.
1.2 IEx.pry/0 – Live Interaction from the Shell
For a more interactive experience, IEx.pry/0 temporarily hands control of the current process back to the IEx shell. While the process is paused, you can examine bindings, evaluate expressions, and even change variable values.
defmodule ChatServer do
def handle_message(%{user: user, text: text} = msg) do
# Pause execution whenever a message contains the word "debug"
if String.contains?(text, "debug"), do: IEx.pry()
broadcast(user, text)
{:ok, msg}
end
end
When a client sends a “debug” message, the server process stops, and you can type v() in the IEx session to list all variables, or any arbitrary Elixir code to explore the current state.
1.3 Automated Tests – Your First Line of Defense
Unit and integration tests surface many bugs before they ever reach production. A well‑structured test suite reduces the need for ad‑hoc debugging in the field.
2. Structured Logging with Logger
In production you should replace informal IO.inspect calls with the robust Logger framework. Logger supports multiple back‑ends, log levels, metadata and runtime configuration changes.
2.1 Setting Up a Logger Backend for JSON Logs
Suppose you run a financial transaction service and need to ship logs to an external log aggregation system. You can add the logger_json_backend library (or write your own) and configure it in config/runtime.exs:
import Config
config :logger,
level: :info,
backends: [{LoggerJSONBackend, :json}]
config :logger, :json,
metadata: [:request_id, :module],
format: "$time $metadata $message\\n"
Then, throughout your code, emit structured events:
defmodule Payments.Gateway do
require Logger
def charge(user_id, amount) do
Logger.info("Attempting charge",
request_id: UUID.uuid4(),
user_id: user_id,
amount: amount
)
# …charge logic…
Logger.info("Charge successful",
request_id: UUID.uuid4(),
user_id: user_id,
amount: amount
)
end
end
The resulting JSON lines can be ingested by tools like Elastic, Splunk, or Grafana Loki, making post‑mortem analysis far more efficient.
2.2 Custom Backend Example – Sending Logs to a Remote HTTP Endpoint
Below is a minimal custom backend that forwards each log entry to a remote webhook. This is useful for alerting systems that expect HTTP payloads.
defmodule RemoteLogBackend do
@behaviour :gen_event
def init(_args) do
{:ok, %{url: "https://log-collector.example.com/ingest"}}
end
def handle_event({level, _gl, {Logger, msg, ts, meta}}, state) do
body = %{
level: level,
timestamp: ts,
message: to_string(msg),
metadata: meta
}
:httpc.request(:post, {state.url, [], 'application/json', Jason.encode!(body)}, [], [])
{:ok, state}
end
def handle_event(_, state), do: {:ok, state}
def handle_call(_, state), do: {:ok, :ok, state}
def code_change(_, state, _), do: {:ok, state}
def terminate(_, _), do: :ok
end
Register the backend in your configuration:
config :logger, backends: [:console, RemoteLogBackend]
3. Interacting with a Running Node
One of the most powerful features of the BEAM VM is the ability to attach to a live node and execute code against it. Two common patterns are:
- Remote shells (
iex --name … --cookie …) for ad‑hoc inspection. - Running
:observeror custom GUI tools that visualize system state.
3.1 Attaching a Remote Shell
Start your production service (e.g., a catalog app) as a release:
_build/prod/rel/catalog/bin/catalog start
Open a hidden IEx node that shares the same cookie:
iex --hidden --name monitor@127.0.0.1 --cookie secret_cookie
From this shell you can query process information:
iex(monitor@127.0.0.1)1> :erlang.system_info(:process_count)
iex(monitor@127.0.0.1)2> Process.list() |> Enum.take(5)
iex(monitor@127.0.0.1)3> Process.info(self(), :dictionary)
These calls give you an instant snapshot of the VM’s health without stopping the service.
3.2 Using :observer Across Nodes
The built‑in :observer GUI provides a visual overview of process counts, memory usage, and ETS tables. To monitor a remote node, the target must include the :runtime_tools application. Add it to mix.exs:
defmodule Catalog.MixProject do
def application do
[
extra_applications: [:logger, :runtime_tools]
]
end
end
After recompiling the release, start the observer on the monitor node:
iex --hidden --name observer@127.0.0.1 --cookie secret_cookie -S mix
iex(observer@127.0.0.1)1> :observer.start()
In the observer window, select Nodes → catalog@127.0.0.1. You’ll now see live charts of memory, CPU, and process mailboxes, all without installing any extra tooling on the production host.
3.3 Web‑Based Observability – Wobserver
When a graphical environment isn’t available (e.g., on a headless VM), a web‑based observer like Wobserver can be added to your release. It runs a small Phoenix endpoint and mirrors the functionality of :observer over HTTP.
# In mix.exs
defp deps do
[
{:wobserver, "~> 0.2"},
# …other deps…
]
end
# In your application start
def start(_type, _args) do
children = [...]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
:wobserver.start()
end
Now you can point a browser to http://host:4000/wobserver and explore the same metrics you’d get from the desktop observer.
4. Tracing Execution Flow
Tracing gives you visibility into the exact sequence of function calls, message passes, and state changes. BEAM provides two major tracing facilities:
:sys.trace/2– Low‑overhead per‑process tracing.:dbg/:erlang.trace/3– System‑wide tracing with pattern matching.
4.1 Tracing a Single GenServer with :sys.trace/2
Imagine a Cache.Server GenServer that stores user profiles. To watch every call it receives, you can enable tracing from a remote shell:
iex(monitor@127.0.0.1)1> pid = Cache.Server.whereis(:user_cache)
iex(monitor@127.0.0.1)2> :sys.trace(pid, true)
Now each incoming handle_call/3 and outgoing reply prints to the shell:
*DBG* {cache_server, :user_cache} got call {:fetch, 42} from #PID<0.123.0>
*DBG* {cache_server, :user_cache} sent {:ok, %User{id: 42, name: "Alice"}} to #PID<0.123.0>
When you’re done, turn tracing off with :sys.trace(pid, false). Remember that tracing adds I/O overhead; use it sparingly on a production node.
4.2 System‑Wide Tracing with :dbg
For more comprehensive analysis—e.g., tracking all calls to the Catalog.Product module across many processes—use the :dbg library. First, start a dedicated tracer node:
iex --name tracer@127.0.0.1 --cookie secret_cookie --hidden
Then configure the tracer:
iex(tracer@127.0.0.1)1> :dbg.tracer()
iex(tracer@127.0.0.1)2> :dbg.n(:'catalog@127.0.0.1')
iex(tracer@127.0.0.1)3> :dbg.p(:all, [:call])
iex(tracer@127.0.0.1)4> :dbg.tp(Catalog.Product, [])
Explanation of the commands:
:dbg.tracer()– Starts the tracing engine on the current node.:dbg.n/1– Connects the tracer to the target node (the running service).:dbg.p(:all, [:call])– Requests tracing of all:callevents.:dbg.tp(Module, [])– Sets a trace pattern for every function inCatalog.Product.
When a client requests a product price, the tracer node prints lines like:
(1234.567.0) call 'Elixir.Catalog.Product':price(%Product{id: 99, name: "Laptop", price: 1299})
(1234.567.0) return from 'Elixir.Catalog.Product':price/1 -> 1299
When you have collected enough data, stop tracing with:
iex(tracer@127.0.0.1)5> :dbg.stop_clear()
4.3 The Recon Library – A Handy Toolbox
Recon aggregates many common tracing, inspection and statistics functions into a single dependency. For example, Recon.Trace.calls/2 can retrieve a summary of the most frequently invoked functions on a node.
defmodule Diagnostics do
@moduledoc false
def top_calls(node) do
:rpc.call(node, Recon.Trace, :calls, [100, :all])
|> Enum.take(10)
|> Enum.map(fn {{mod, fun, _arity}, count} ->
"#{inspect(mod)}.#{fun} → #{count} calls"
end)
end
end
Running Diagnostics.top_calls(:"catalog@127.0.0.1") from a remote shell gives you a quick view of hotspots without manual tracing setup.
5. Benchmarking and Profiling
Understanding performance involves measuring execution time and identifying hot paths. Elixir ships with a few simple tools, while the Erlang ecosystem supplies dedicated profilers.
5.1 Quick Timing with :timer.tc/1
Wrap a function call with :timer.tc/1 to receive the elapsed microseconds along with the function’s return value.
{time_us, result} = :timer.tc(fn ->
Catalog.search("smartphone")
end)
IO.puts("Search took #{time_us / 1_000} ms")
5.2 Structured Benchmarks with Benchee
For more systematic micro‑benchmarks, add Benchee to your mix.exs and define a suite:
defmodule Benchmarks.SearchBench do
use Benchee.Script
def run do
Benchee.run(%{
"search_by_name" => fn -> Catalog.search("camera") end,
"search_by_category" => fn -> Catalog.search_by(:electronics) end
})
end
end
Running mix run bench/search_bench.exs prints a nicely formatted table with average runtimes, standard deviations and memory usage.
5.3 Profiling with cprof, eprof, fprof
The Erlang VM ships three built‑in profilers, each with a different focus:
- cprof – Counts the number of function calls.
- eprof – Measures execution time per function.
- fprof – Provides a detailed call‑graph with time and memory breakdown.
Invoke them through Mix tasks:
mix profile.cprof --module Catalog.Product
mix profile.eprof --module Catalog.Product
mix profile.fprof --module Catalog.Product
Each task generates a profile.log file that can be inspected with erl -man cprof or visualized using fprof's UI.
6. Common Pitfalls to Avoid
- Leaving
IO.inspectin production. It writes to STDOUT, bypasses log rotation, and can reveal sensitive data. - Running tracing continuously. Traces generate a lot of output and can degrade throughput. Enable them only for the duration of an investigation.
- Neglecting OTP supervision trees. Directly killing a process without a supervisor can lead to orphaned workers.
- Hard‑coding node names and cookies. Store them in environment variables or configuration files; otherwise deployments become fragile.
- Not including
:runtime_toolsin releases. Without it you won’t be able to start:observeror remote tracing on the production node.
7. Summary
- Debugging concurrent Elixir systems relies on instrumentation (
IO.inspect,IEx.pry) and exhaustive test suites, not step‑by‑step breakpoints. - Replace ad‑hoc prints with structured
Loggercalls, possibly routing logs to JSON back‑ends or remote HTTP collectors. - Remote shells and
:observer(or web‑based alternatives like Wobserver) give you live access to VM metrics, process info, and memory statistics. - Tracing (
:sys.trace/2,:dbg) lets you watch function calls and message flow, but must be used judiciously to avoid performance impact. - Benchmarking with
:timer.tcorBencheeand profiling withcprof/eprof/fprofare essential for spotting bottlenecks. - Avoid common mistakes such as leaving debug prints, over‑tracing, and forgetting required OTP applications in releases.
Armed with these tools and practices, you’ll be able to keep a finger on the pulse of any Elixir system, diagnose failures swiftly, and maintain performance even as your application scales across many nodes.