The Boring URL Shortener: System Design Interviews on the BEAM

“Design a URL shortener.”

If you’ve ever done a system design interview, you’ve heard this one. It’s the canonical opener. The interviewer leans back, you grab a marker, and you start drawing boxes: a web server here, a database there, a cache layer, a load balancer, maybe a CDN. The expected answer involves a web server, a database, a cache, and a conversation about how you’d scale it.

But what if the answer was just… a few dozen lines of Elixir?

I want to walk through the escalating requirements the way an interview works, where each follow-up is supposed to be harder than the last. On the BEAM, they’re not. They’re boring.

I’ll show actual code for each round. System design interviews don’t ask for that: they want boxes and arrows, not implementations. But showing the code is the point. When the implementation is this short, drawing an architecture diagram feels like overkill.

Let’s go.

Round 1: “Design a basic URL shortener”

An ETS table holds the mapping. A Phoenix endpoint handles requests. That’s it.

defmodule Shortener do
  use GenServer

  def start_link(_), do: GenServer.start_link(__MODULE__, nil, name: __MODULE__)

  def init(_) do
    :ets.new(:urls, [:set, :public, :named_table, read_concurrency: true])
    {:ok, nil}
  end

  def shorten(url) do
    code = :crypto.strong_rand_bytes(4) |> Base.url_encode64(padding: false)
    :ets.insert(:urls, {code, url})
    code
  end

  def lookup(code) do
    case :ets.lookup(:urls, code) do
      [{^code, url}] -> url
      [] -> nil
    end
  end
end

ETS (Erlang Term Storage) is an in-memory key-value store built into the runtime. The GenServer exists only to own the table and keep it alive. The actual reads and writes go directly to ETS from whatever process calls them: no single-process bottleneck, no serialization. With read_concurrency: true, reads happen in parallel across all CPU cores with no locking.

# In your Phoenix router
get "/:code", RedirectController, :show
post "/shorten", RedirectController, :create

# Controller
def show(conn, %{"code" => code}) do
  case Shortener.lookup(code) do
    nil -> send_resp(conn, 404, "Not found")
    url -> redirect(conn, external: url)
  end
end

def create(conn, %{"url" => url}) do
  code = Shortener.shorten(url)
  json(conn, %{short_url: "https://sho.rt/#{code}"})
end

That’s a working URL shortener. No database, no cache layer, no external dependencies.

Boxes on our architecture diagram: 1. Next question?

Round 2: “Handle 10,000 requests per second”

This is where the interviewer expects you to start sweating. “Tell me about your caching strategy. How do you handle connection pooling? What about async workers?”

There’s nothing to change. The implementation from Round 1 already handles this.

Phoenix handles 100k+ concurrent connections on a single server. That’s not marketing: Chris McCord demonstrated 2 million simultaneous WebSocket connections on a single box back in 2015. For a redirect service (small payloads, fast responses), 10k/s is a rounding error.

And the ETS table we’re already using? Benchmarks routinely show millions of reads per second. Every Phoenix request handler reads directly from the table in parallel. There’s no bottleneck process, no lock contention, no cache layer to add.

In Python, this is where you’d introduce Redis, connection pools, and a conversation about async workers. In Node.js, you’d talk about clustering across cores. Here, we just… move on.

Boxes on our architecture diagram: still 1. Next?

Round 3: “Add click analytics”

The interviewer nods. “OK, but now I want to track how many times each short URL is clicked. Timestamps, referrers, the whole thing.”

In a traditional stack, you’d write click events to a database, probably asynchronously so the redirect doesn’t wait on the write. If you’re worried about write volume, you might batch them or add Redis as a buffer. Either way, it’s another piece of infrastructure to think about.

On the BEAM, you add another ETS table:

# In your Shortener.init/1, create a second table
:ets.new(:clicks, [:bag, :public, :named_table, write_concurrency: true])

# In your redirect controller
def show(conn, %{"code" => code}) do
  case Shortener.lookup(code) do
    nil ->
      send_resp(conn, 404, "Not found")
    url ->
      :ets.insert(:clicks, {code, DateTime.utc_now(), get_req_header(conn, "referer")})
      redirect(conn, external: url)
  end
end

# Query stats for a given code
:ets.lookup(:clicks, "abc123")

That’s it. The :bag table type allows multiple rows per key, so every click appends a new entry. With write_concurrency: true, concurrent inserts from different request handlers don’t block each other. The :ets.insert call takes microseconds: the redirect doesn’t slow down at all. No message queue. No separate analytics pipeline. No Kafka.

Could you do this with Redis? Sure, Redis is fast enough. The difference isn’t performance: it’s that ETS is built into the runtime. There’s no separate service to deploy, no network hop, no connection pool to manage, no Redis cluster to monitor at 3am when it runs out of memory.

Boxes on our architecture diagram: still 1.

Round 4: “Make it distributed across multiple nodes”

Now the interviewer is getting serious. “Your service is so successful it needs to run on multiple machines. How do you distribute the data? How do nodes discover each other? How do you route requests to the right node?”

In most stacks, this means introducing shared infrastructure: a load balancer in front of multiple app servers, a centralized database they all talk to, and some form of service discovery. And once traffic outgrows a single database, you’re also dealing with database replication, read replicas, or sharding. The application distribution and the data distribution are two separate problems to solve.

On the BEAM, you swap ETS for Mnesia. Mnesia is essentially distributed ETS: it’s a database built into OTP that replicates tables across nodes automatically.

defmodule Shortener do
  def setup_mnesia(nodes) do
    :mnesia.create_schema(nodes)
    :mnesia.start()

    :mnesia.create_table(:urls, [
      attributes: [:code, :url],
      disc_copies: nodes
    ])

    :mnesia.create_table(:clicks, [
      attributes: [:id, :code, :timestamp, :referer],
      type: :bag,
      disc_copies: nodes
    ])
  end

  def shorten(url) do
    code = :crypto.strong_rand_bytes(4) |> Base.url_encode64(padding: false)
    :mnesia.transaction(fn -> :mnesia.write({:urls, code, url}) end)
    code
  end

  def lookup(code) do
    :mnesia.dirty_read({:urls, code})
    |> case do
      [{:urls, ^code, url}] -> url
      [] -> nil
    end
  end
end

The disc_copies: nodes option tells Mnesia to replicate the table to every node in the list. When you write a URL on node 1, it’s available on node 2 automatically. Reads with dirty_read go straight to the local copy, no network hop, same speed as ETS. The click analytics table comes along for free.

When a new node joins, you add it to the Mnesia cluster and it syncs the existing data. No separate database to deploy. No Redis cluster. No replication configuration. The distribution layer is the same one Ericsson built for replicating phone switch state across redundant hardware.

A caveat: Mnesia was designed for configuration and session data, not as a general-purpose database. It has table size limits and network partition recovery can require manual intervention. For a URL shortener’s working set, that’s fine. For billions of rows, you’d bring in Postgres. But the point stands: for the scale this interview question is asking about, the built-in tool handles it.

Boxes on our architecture diagram: 3 (three nodes running the same Elixir app).
Distinct infrastructure components: still 1.

Round 5: “What happens when a node goes down?”

The interviewer’s favorite follow-up. “One of your three nodes just died. What happens to the URLs it was serving? How do clients get rerouted? How long is the downtime?”

In a traditional system, this is where you talk about health checks, circuit breakers, DNS failover, session draining, and maybe Kubernetes pod restarts. You draw a timeline showing detection (30s), failover (15s), and recovery (60s). Total: maybe two minutes of degraded service if everything goes right.

On the BEAM:

defmodule Shortener.Application do
  use Application

  def start(_type, _args) do
    children = [
      Shortener,
      ShortenerWeb.Endpoint
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

That’s it. That’s the disaster recovery plan.

When a node goes down, the other nodes keep serving. Mnesia detects the missing node via heartbeat and continues operating on the surviving replicas. When the node comes back, Mnesia re-syncs its local copy automatically. The supervisor tree ensures the application restarts cleanly. There’s no special failure-handling code because failure handling is the default behavior.

The “let it crash” philosophy means you don’t write defensive code against node failures. You write your supervision tree (which you needed anyway to start your processes) and the runtime handles the rest. Processes crash, supervisors restart them, life goes on.

In a traditional setup, you’d configure health checks, set up your load balancer to stop routing to the dead node, and hope your database handles the failover gracefully. It’s not rocket science, but it’s work you have to think about and get right. On the BEAM, it’s the default.

Distinct infrastructure components: still 1.

Round 6: “Rate limit abusive users”

Final boss. “Someone is hammering your service with millions of shortening requests. How do you rate limit them without affecting legitimate users?”

Most frameworks have rate limiting middleware that handles this. If you’re on a single server, an in-memory counter works. If you’re distributed (and we are, since Round 4), you typically reach for Redis so all nodes share the same counters.

On the BEAM, you spawn a process per user:

defmodule RateLimiter do
  use GenServer

  def allow?(ip) do
    pid = get_or_start(ip)
    GenServer.call(pid, :check)
  end

  defp get_or_start(ip) do
    case Registry.lookup(Shortener.Registry, {:rate, ip}) do
      [{pid, _}] -> pid
      [] ->
        {:ok, pid} = DynamicSupervisor.start_child(
          Shortener.TrackerSup,
          {__MODULE__, ip}
        )
        pid
    end
  end

  def init(ip) do
    # Auto-cleanup: if no requests for 60s, this process dies
    {:ok, %{ip: ip, count: 0, window_start: System.monotonic_time(:second)}, 60_000}
  end

  def handle_call(:check, _from, state) do
    now = System.monotonic_time(:second)
    state = maybe_reset_window(state, now)

    if state.count >= 100 do
      {:reply, :rate_limited, state, 60_000}
    else
      {:reply, :ok, %{state | count: state.count + 1}, 60_000}
    end
  end

  def handle_info(:timeout, state), do: {:stop, :normal, state}

  defp maybe_reset_window(state, now) do
    if now - state.window_start >= 60 do
      %{state | count: 0, window_start: now}
    else
      state
    end
  end
end

Each IP address gets its own process. The process tracks request counts, resets on a time window, and automatically terminates after 60 seconds of inactivity (the :timeout return value handles this). No external infrastructure. No Redis. No Lua scripts.

Because each rate limiter is an isolated process, a burst of traffic from one IP creates exactly one process for that IP. It doesn’t affect the rate limiting of any other IP. There’s no shared counter to contend over, no distributed lock to acquire.

A caveat: since we’re distributed (Round 4), these rate limiters are node-local. An IP hitting all three nodes gets separate counters on each. In practice this is fine as a first line of defense: a load balancer distributes traffic roughly evenly, so a 100/min limit per node means ~300/min across the cluster. If you need globally precise limits, you’d coordinate via Mnesia or a shared counter. But the per-node approach handles the common case without any additional infrastructure.

Distinct infrastructure components after six rounds of escalation: 1.

The pattern

Every round followed the same arc. The interviewer introduces a requirement that’s supposed to be hard. The expected answer involves new infrastructure: a cache, a queue, a service mesh, a distributed lock. On the BEAM, the answer is usually “a process” or “the runtime already does that.”

Here’s the mapping:

Interview requirement	Traditional answer	BEAM answer
Basic key-value storage	Database + cache	ETS table
High concurrency	Redis, connection pools, async workers	Already handled (ETS + scheduler)
Async analytics	Kafka/Redis Streams + consumer service	ETS `:bag` table (microsecond writes)
Distribution across nodes	Shared database, Redis cluster, service discovery	Mnesia (distributed ETS, built into OTP)
Node failure recovery	K8s health checks, circuit breakers, failover routing	Supervisor trees + Mnesia replication
Per-user rate limiting	Redis + Lua scripts	One process per user

Look at the right column: we still have all the boxes. There’s a key-value cache (ETS), a distributed database (Mnesia), an analytics store (ETS bag), a per-user rate limiting service (processes). The architecture diagram isn’t simpler because we eliminated components. It’s simpler because they all live inside the runtime instead of running as separate infrastructure. And that has compounding benefits:

Fewer network hops. ETS reads are in-process memory access. Mnesia reads go to a local replica. Writes do cross the network for replication, but reads (the vast majority of traffic for a URL shortener) stay local. Compare that to a round-trip to Redis or Postgres on every request.
Single deployment artifact. One release to build, ship, and roll back. Not five services with their own versions, configs, and dependency trees.
Unified failure handling. Supervisors manage everything: your web server, your data layer, your rate limiters. You don’t need separate health checks and alerting for Redis, Kafka, and your app.
No serialization overhead. Data stays as native Erlang terms in memory. No JSON encoding between your app and a cache, no protocol buffers between services.
One place to look. One set of logs, one metrics pipeline, one tracing system. When something goes wrong at 3am, you’re not correlating timestamps across five different services to figure out what happened.

What this means

The punchline isn’t “use Elixir for everything.” Plenty of problems don’t need what the BEAM offers. If your URL shortener is a weekend project that serves 100 requests a day, use whatever you want.

The point is this: when your runtime was designed for telecom infrastructure (millions of concurrent calls, five-nines uptime, hot upgrades, distributed across data centers), a URL shortener isn’t an interesting system design problem. It’s a trivial one. Every “hard” follow-up question maps to something the runtime already does.

And that reframes the whole exercise. The interesting system design questions on the BEAM aren’t “how do you scale this?” or “what happens when a node dies?” Those are solved. The interesting questions become: “What would you build if scaling and fault tolerance weren’t the hard parts? What system would you design if the infrastructure just… worked?”

That’s a much more fun interview.

George Guimaraes builds agentic commerce infrastructure at New Generation. Previously: Principal Engineer at a unicorn fintech, co-founder of Plataformatec (acqui-hired by Nubank).

The Boring URL Shortener: System Design Interviews on the BEAM

Round 1: “Design a basic URL shortener”

Round 2: “Handle 10,000 requests per second”

Round 3: “Add click analytics”

Round 4: “Make it distributed across multiple nodes”

Round 5: “What happens when a node goes down?”

Round 6: “Rate limit abusive users”

The pattern

What this means

Read next

Hallmark: detect LLM hallucinations locally in Elixir

Arcana: Embeddable RAG for Elixir/Phoenix

Stephen: ColBERT-Style Neural Retrieval for Elixir