Alike: Semantic Similarity Testing with the Wave Operator

Test LLM outputs by meaning, not by characters. A wave operator for semantic similarity assertions in Elixir, powered by local embeddings.

Alike: Semantic Similarity Testing with the Wave Operator
Photo by Enis Can Ceyhan / Unsplash

Testing LLM outputs is frustrating. The model might say "The return policy is 30 days" or "You have thirty days to return items" or "Returns are accepted within a month of purchase." All correct, all different strings. Traditional assertions break immediately.

I built Alike to solve this with a single operator: <~>. It compares two strings by meaning, not by characters. It runs locally, costs nothing per assertion, and feels like a natural ExUnit extension.

The wave operator

import Alike.WaveOperator

assert "The quick brown fox jumps over the lazy dog"
       <~> "A fast auburn fox leaps over a sleeping canine"

That's it. Two sentences that mean the same thing, verified by semantic embeddings. No regex gymnastics, no substring hacks. If the meaning matches, the assertion passes.

How it works

Under the hood, Alike runs two models locally via Bumblebee:

  1. Sentence embeddings: converts both strings to 384-dimensional vectors using all-MiniLM-L6-v2, then computes cosine similarity
  2. Natural Language Inference (NLI): classifies the relationship as entailment, contradiction, or neutral to catch logical conflicts that raw similarity would miss

Everything runs locally. No API keys, no network calls, no cost per assertion. The models download once on first run (~460MB total, stored in ~/.cache/bumblebee/), and subsequent runs are instant.

Catching contradictions

NLI is what makes Alike more than a similarity score. If your LLM says "The price is $10" but should have said "The price is $50", cosine similarity might still be high since both sentences are about prices, so a similarity-only check would pass and hide the bug. Alike catches this by flagging it as a contradiction:

{:ok, result} = Alike.classify("The price is $10", "The price is $50")
result.label
# => "contradiction"

This matters for testing correctness, not just relevance.

Beyond the operator

For more control, use the functions directly:

# Get raw similarity score (0.0 to 1.0)
{:ok, score} = Alike.similarity("machine learning", "artificial intelligence")
# score => 0.70

# Check with custom threshold
Alike.alike?("fast car", "quick automobile", threshold: 0.6)
# => true

Thresholds are configurable globally or per-call. The defaults (0.45 similarity, 0.8 contradiction confidence) work well for most cases, but you can tune them for your domain.

Where it fits

Alike powers the semantic assertions in Tribunal, my LLM evaluation framework for Elixir. When you write assert_semantic_similar response, "expected meaning" in a Tribunal test case, Alike is doing the heavy lifting underneath.

But it's useful on its own too. Any test where you're comparing natural language output benefits from semantic comparison: chatbot responses, translation quality, summarization accuracy, NLP pipeline validation.

# In your mix.exs (test only)
{:alike, "~> 0.3.0", only: :test}

Check it out on GitHub and HexDocs.


Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to George Guimarães..

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.