Fallbacks

When working with language models, you may often encounter issues from the underlying APIs, e.g. rate limits or downtime. Therefore, as you move your LLM applications into production it becomes more and more important to have contingencies for errors. That's why we've introduced the concept of fallbacks.

Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level. This is important because often times different models require different prompts. So if your call to OpenAI fails, you don't just want to send the same prompt to Anthropic - you probably want want to use e.g. a different prompt template.

Handling LLM API errors

This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit a rate limit, or any number of things.

IMPORTANT: By default, many of LangChain's LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retrying rather than failing.

tip

See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/anthropic @langchain/openai

yarn add @langchain/anthropic @langchain/openai

pnpm add @langchain/anthropic @langchain/openai

import { ChatOpenAI } from "@langchain/openai";
import { ChatAnthropic } from "@langchain/anthropic";

// Use a fake model name that will always throw an error
const fakeOpenAIModel = new ChatOpenAI({
  model: "potato!",
  maxRetries: 0,
});

const anthropicModel = new ChatAnthropic({});

const modelWithFallback = fakeOpenAIModel.withFallbacks({
  fallbacks: [anthropicModel],
});

const result = await modelWithFallback.invoke("What is your name?");

console.log(result);

/*
  AIMessage {
    content: ' My name is Claude. I was created by Anthropic.',
    additional_kwargs: {}
  }
*/

API Reference:

ChatOpenAI from @langchain/openai
ChatAnthropic from @langchain/anthropic

Fallbacks for RunnableSequences

We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models: ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely want a different prompt.

import { ChatOpenAI, OpenAI } from "@langchain/openai";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate, PromptTemplate } from "@langchain/core/prompts";

const chatPrompt = ChatPromptTemplate.fromMessages<{ animal: string }>([
  [
    "system",
    "You're a nice assistant who always includes a compliment in your response",
  ],
  ["human", "Why did the {animal} cross the road?"],
]);

// Use a fake model name that will always throw an error
const fakeOpenAIChatModel = new ChatOpenAI({
  model: "potato!",
  maxRetries: 0,
});

const prompt =
  PromptTemplate.fromTemplate(`Instructions: You should always include a compliment in your response.

Question: Why did the {animal} cross the road?

Answer:`);

const openAILLM = new OpenAI({});

const outputParser = new StringOutputParser();

const badChain = chatPrompt.pipe(fakeOpenAIChatModel).pipe(outputParser);

const goodChain = prompt.pipe(openAILLM).pipe(outputParser);

const chain = badChain.withFallbacks({
  fallbacks: [goodChain],
});

const result = await chain.invoke({
  animal: "dragon",
});

console.log(result);

/*
  I don't know, but I'm sure it was an impressive sight. You must have a great imagination to come up with such an interesting question!
*/

API Reference:

ChatOpenAI from @langchain/openai
OpenAI from @langchain/openai
StringOutputParser from @langchain/core/output_parsers
ChatPromptTemplate from @langchain/core/prompts
PromptTemplate from @langchain/core/prompts

Handling long inputs

One of the big limiting factors of LLMs in their context window. Sometimes you can count and track the length of prompts before sending them to an LLM, but in situations where that is hard/complicated you can fallback to a model with longer context length.

import { ChatOpenAI } from "@langchain/openai";

// Use a model with a shorter context window
const shorterLlm = new ChatOpenAI({
  model: "gpt-3.5-turbo",
  maxRetries: 0,
});

const longerLlm = new ChatOpenAI({
  model: "gpt-3.5-turbo-16k",
});

const modelWithFallback = shorterLlm.withFallbacks({
  fallbacks: [longerLlm],
});

const input = `What is the next number: ${"one, two, ".repeat(3000)}`;

try {
  await shorterLlm.invoke(input);
} catch (e) {
  // Length error
  console.log(e);
}

const result = await modelWithFallback.invoke(input);

console.log(result);

/*
  AIMessage {
    content: 'The next number is one.',
    name: undefined,
    additional_kwargs: { function_call: undefined }
  }
*/

API Reference:

ChatOpenAI from @langchain/openai

Fallback to a better model

Often times we ask models to output format in a specific format (like JSON). Models like GPT-3.5 can do this okay, but sometimes struggle. This naturally points to fallbacks - we can try with a faster and cheaper model, but then if parsing fails we can use GPT-4.

import { z } from "zod";
import { OpenAI, ChatOpenAI } from "@langchain/openai";
import { StructuredOutputParser } from "langchain/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";

const prompt = PromptTemplate.fromTemplate(
  `Return a JSON object containing the following value wrapped in an "input" key. Do not return anything else:\n{input}`
);

const badModel = new OpenAI({
  maxRetries: 0,
  model: "gpt-3.5-turbo-instruct",
});

const normalModel = new ChatOpenAI({
  model: "gpt-4",
});

const outputParser = StructuredOutputParser.fromZodSchema(
  z.object({
    input: z.string(),
  })
);

const badChain = prompt.pipe(badModel).pipe(outputParser);

const goodChain = prompt.pipe(normalModel).pipe(outputParser);

try {
  const result = await badChain.invoke({
    input: "testing0",
  });
} catch (e) {
  console.log(e);
  /*
  OutputParserException [Error]: Failed to parse. Text: "

  { "name" : " Testing0 ", "lastname" : " testing ", "fullname" : " testing ", "role" : " test ", "telephone" : "+1-555-555-555 ", "email" : " testing@gmail.com ", "role" : " test ", "text" : " testing0 is different than testing ", "role" : " test ", "immediate_affected_version" : " 0.0.1 ", "immediate_version" : " 1.0.0 ", "leading_version" : " 1.0.0 ", "version" : " 1.0.0 ", "finger prick" : " no ", "finger prick" : " s ", "text" : " testing0 is different than testing ", "role" : " test ", "immediate_affected_version" : " 0.0.1 ", "immediate_version" : " 1.0.0 ", "leading_version" : " 1.0.0 ", "version" : " 1.0.0 ", "finger prick" :". Error: SyntaxError: Unexpected end of JSON input
*/
}

const chain = badChain.withFallbacks({
  fallbacks: [goodChain],
});

const result = await chain.invoke({
  input: "testing",
});

console.log(result);

/*
  { input: 'testing' }
*/

API Reference:

OpenAI from @langchain/openai
ChatOpenAI from @langchain/openai
StructuredOutputParser from langchain/output_parsers
PromptTemplate from @langchain/core/prompts

Fallbacks

Handling LLM API errors​

API Reference:

Fallbacks for RunnableSequences​

API Reference:

Handling long inputs​

API Reference:

Fallback to a better model​

API Reference:

Help us out by providing feedback on this documentation page:

Handling LLM API errors

Fallbacks for RunnableSequences

Handling long inputs

Fallback to a better model