Full Stack AI with Elixir: San Francisco Global Elixir Meetup Talk

My talk at the San Francisco Global Elixir Meetup on building full-stack AI solutions in Elixir: multiple chains, in-memory RAG, MCP clients, and neural networks in your supervision tree.

San Francisco skyline at night
Photo by Chris Briggs on Unsplash

Last September, I gave a talk at the San Francisco Global Elixir Meetup, part of the GEM (Global Elixir Meetups) initiative, hosted at the New Generation office in San Francisco. The topic: building full-stack AI solutions entirely within the BEAM ecosystem.

AI agents are just HTTP servers (that take longer to respond)

An AI agent takes a user prompt, calls an LLM, executes tools in a loop, and returns a result. The whole cycle takes seconds, not the microseconds we used to brag about with Phoenix. This shift to multi-second agentic workflows changes how we think about runtimes. Python struggles with the GIL, JavaScript has decent async I/O, Go has good concurrency. But the BEAM was built for exactly this: millions of concurrent, long-lived processes doing lots of I/O with fault tolerance baked in.

Meanwhile, AI agent frameworks in Python are reinventing the actor model, message passing, and supervision trees that Erlang/OTP has had for decades.

Multiple chains, not one giant agent

One big agent with a massive system prompt and dozens of tools doesn't work well: the LLM gets confused about which tools to use. The fix is splitting into small, focused chains. One handles money transfers, another image generation, another web search via Perplexity.

A neat pattern: a "personality chain" where the first chain does the actual work and returns structured output, then a second chain (no tools, just a rewriting prompt) makes it warmer and conversational. The result feels like talking to a helpful friend instead of reading bullet points.

In-memory RAG without a vector database

Using Bumblebee and Nx, you can load an embedding model as an Elixir serving, chunk your knowledge base with text_chunker, build an HNSW index, and store everything in :persistent_term. No pgvector, no Pinecone, no external service. Reads are near-zero latency because everything lives in BEAM memory.

Tradeoff: you need a GPU (or a CPU-friendly model) and RAM grows with your knowledge base. But for many use cases, it's a remarkably simple architecture with zero moving parts.

MCP clients for team autonomy

With MCP (Model Context Protocol), the agent acts as a client while other teams expose their functionality as MCP servers. A payments team can ship new tools to the AI agent without the agent team touching any code. Elixir is a natural fit since MCP connections (especially SSE) are persistent and one-to-one, exactly the workload the BEAM handles effortlessly. Anubis MCP by Zoey provides a clean abstraction for both clients and servers in Elixir.

Neural networks in your supervision tree

You can run a Hugging Face model for NSFW image classification as a Bumblebee serving inside your application's supervision tree. It sits alongside your Repo and Endpoint. No separate ML infrastructure, no Ray, no Vertex AI. With Elixir's distribution, GPU-bound work (neural networks, RAG indexing) goes on specific nodes while non-GPU work (LiveView dashboards, MCP, APIs) runs on cheaper machines.

When I explain this to Python developers, they're genuinely confused. In Elixir, it's just another module.


George Guimarães builds agentic commerce infrastructure at New Generation. Previously: Principal Engineer at a unicorn fintech, co-founder of Plataformatec (acqui-hired by Nubank).


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to George Guimarães..

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.