Comparison Evaluators

Comparison evaluators in LangChain help measure two different chains or LLM outputs. These evaluators are helpful for comparative analyses, such as A/B testing between two language models, or comparing different versions of the same model. They can also be useful for things like generating preference scores for ai-assisted reinforcement learning.

These evaluators inherit from the PairwiseStringEvaluator or LLMPairwiseStringEvaluator class, providing a comparison interface for two strings - typically, the outputs from two different prompts or models, or two versions of the same model. In essence, a comparison evaluator performs an evaluation on a pair of strings and returns a dictionary containing the evaluation score and other relevant details.

To create a custom comparison evaluator, inherit from the PairwiseStringEvaluator or LLMPairwiseStringEvaluator abstract classes exported from langchain/evaluation and overwrite the _evaluateStringPairs method.

Here's a summary of the key methods and properties of a comparison evaluator:

_evaluateStringPairs: Evaluate the output string pairs. This function should be overwritten when creating custom evaluators.
requiresInput: This property indicates whether this evaluator requires an input string.
requiresReference: This property specifies whether this evaluator requires a reference label.

Detailed information about creating custom evaluators and the available built-in comparison evaluators is provided in the following sections.

📄️ Pairwise Embedding Distance

One way to measure the similarity (or dissimilarity) between two predictions on a shared or similar input is to embed the predictions and compute a vector distance between the two embeddings.

📄️ Pairwise String Comparison

Often you will want to compare predictions of an LLM, Chain, or Agent for a given input. The StringComparison evaluators facilitate this so you can answer questions like:

Comparison Evaluators

📄️ Pairwise Embedding Distance

📄️ Pairwise String Comparison

Help us out by providing feedback on this documentation page: