Last September, I gave a talk at the San Francisco Global Elixir Meetup, part of the GEM (Global Elixir Meetups) initiative, hosted at the New Generation office in San Francisco. The topic: building full-stack AI solutions entirely within the BEAM ecosystem.
AI agents are just HTTP servers (that take longer to respond)
An AI agent takes a user prompt, calls an LLM, executes tools in a loop, and returns a result. The whole cycle takes seconds, not the microseconds we used to brag about with Phoenix. This shift to multi-second agentic workflows changes how we think about runtimes. Python struggles with the GIL, JavaScript has decent async I/O, Go has good concurrency. But the BEAM was built for exactly this: millions of concurrent, long-lived processes doing lots of I/O with fault tolerance baked in.
Meanwhile, AI agent frameworks in Python are reinventing the actor model, message passing, and supervision trees that Erlang/OTP has had for decades.
Multiple chains, not one giant agent
One big agent with a massive system prompt and dozens of tools doesn't work well: the LLM gets confused about which tools to use. The fix is splitting into small, focused chains. One handles money transfers, another image generation, another web search via Perplexity.
A neat pattern: a "personality chain" where the first chain does the actual work and returns structured output, then a second chain (no tools, just a rewriting prompt) makes it warmer and conversational. The result feels like talking to a helpful friend instead of reading bullet points.
In-memory RAG without a vector database
Using Bumblebee and Nx, you can load an embedding model as an Elixir serving, chunk your knowledge base with text_chunker, build an HNSW index, and store everything in :persistent_term. No pgvector, no Pinecone, no external service. Reads are near-zero latency because everything lives in BEAM memory.
Tradeoff: you need a GPU (or a CPU-friendly model) and RAM grows with your knowledge base. But for many use cases, it's a remarkably simple architecture with zero moving parts.
MCP clients for team autonomy
With MCP (Model Context Protocol), the agent acts as a client while other teams expose their functionality as MCP servers. A payments team can ship new tools to the AI agent without the agent team touching any code. Elixir is a natural fit since MCP connections (especially SSE) are persistent and one-to-one, exactly the workload the BEAM handles effortlessly. Anubis MCP by Zoey provides a clean abstraction for both clients and servers in Elixir.
Neural networks in your supervision tree
You can run a Hugging Face model for NSFW image classification as a Bumblebee serving inside your application's supervision tree. It sits alongside your Repo and Endpoint. No separate ML infrastructure, no Ray, no Vertex AI. With Elixir's distribution, GPU-bound work (neural networks, RAG indexing) goes on specific nodes while non-GPU work (LiveView dashboards, MCP, APIs) runs on cheaper machines.
When I explain this to Python developers, they're genuinely confused. In Elixir, it's just another module.