Full Stack AI with Elixir: San Francisco Global Elixir Meetup Talk

My talk at the San Francisco Global Elixir Meetup on building full-stack AI solutions in Elixir: multiple chains, in-memory RAG, MCP clients, and neural networks in your supervision tree.

San Francisco skyline at night
Photo by Chris Briggs on Unsplash

Last September, I gave a talk at the San Francisco Global Elixir Meetup, part of the GEM (Global Elixir Meetups) initiative, hosted at the New Generation office in San Francisco. The topic: building full-stack AI solutions entirely within the BEAM ecosystem.

AI agents are just HTTP servers (that take longer to respond)

An AI agent takes a user prompt, calls an LLM, executes tools in a loop, and returns a result. The whole cycle takes seconds, not the microseconds we used to brag about with Phoenix. This shift to multi-second agentic workflows changes how we think about runtimes. Python struggles with the GIL, JavaScript has decent async I/O, Go has good concurrency. But the BEAM was built for exactly this: millions of concurrent, long-lived processes doing lots of I/O with fault tolerance baked in.

Meanwhile, AI agent frameworks in Python are reinventing the actor model, message passing, and supervision trees that Erlang/OTP has had for decades.

Multiple chains, not one giant agent

One big agent with a massive system prompt and dozens of tools doesn't work well: the LLM gets confused about which tools to use. The fix is splitting into small, focused chains. One handles money transfers, another image generation, another web search via Perplexity.

A neat pattern: a "personality chain" where the first chain does the actual work and returns structured output, then a second chain (no tools, just a rewriting prompt) makes it warmer and conversational. The result feels like talking to a helpful friend instead of reading bullet points.

In-memory RAG without a vector database

Using Bumblebee and Nx, you can load an embedding model as an Elixir serving, chunk your knowledge base with text_chunker, build an HNSW index, and store everything in :persistent_term. No pgvector, no Pinecone, no external service. Reads are near-zero latency because everything lives in BEAM memory.

Tradeoff: you need a GPU (or a CPU-friendly model) and RAM grows with your knowledge base. But for many use cases, it's a remarkably simple architecture with zero moving parts.

MCP clients for team autonomy

With MCP (Model Context Protocol), the agent acts as a client while other teams expose their functionality as MCP servers. A payments team can ship new tools to the AI agent without the agent team touching any code. Elixir is a natural fit since MCP connections (especially SSE) are persistent and one-to-one, exactly the workload the BEAM handles effortlessly. Anubis MCP by Zoey provides a clean abstraction for both clients and servers in Elixir.

Neural networks in your supervision tree

You can run a Hugging Face model for NSFW image classification as a Bumblebee serving inside your application's supervision tree. It sits alongside your Repo and Endpoint. No separate ML infrastructure, no Ray, no Vertex AI. With Elixir's distribution, GPU-bound work (neural networks, RAG indexing) goes on specific nodes while non-GPU work (LiveView dashboards, MCP, APIs) runs on cheaper machines.

When I explain this to Python developers, they're genuinely confused. In Elixir, it's just another module.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to George Guimarães..

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.