Conversational Retrieval QA

info

Looking for the older, non-LCEL version? Click here.

A common requirement for retrieval-augmented generation chains is support for followup questions. Followup questions can contain references to past chat history (e.g. "What did Biden say about Justice Breyer", followed by "Was that nice?"), which make them ill-suited to direct retriever similarity search .

To support followups, you can add an additional step prior to retrieval that combines the chat history (either explicitly passed in or retrieved from the provided memory) and the question into a standalone question. It then performs the standard retrieval steps of looking up relevant documents from the retriever and passing those documents and the question into a question answering chain to return a response.

To create a conversational question-answering chain, you will need a retriever. In the below example, we will create one from a vector store, which can be created from embeddings.

tip

See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/openai @langchain/community

yarn add @langchain/openai @langchain/community

pnpm add @langchain/openai @langchain/community

import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import * as fs from "fs";
import { formatDocumentsAsString } from "langchain/util/document";
import { PromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";

/* Initialize the LLM to use to answer the question */
const model = new ChatOpenAI({});
/* Load in the file we want to do question answering over */
const text = fs.readFileSync("state_of_the_union.txt", "utf8");
/* Split the text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);
/* Create the vectorstore */
const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
const retriever = vectorStore.asRetriever();

const formatChatHistory = (
  human: string,
  ai: string,
  previousChatHistory?: string
) => {
  const newInteraction = `Human: ${human}\nAI: ${ai}`;
  if (!previousChatHistory) {
    return newInteraction;
  }
  return `${previousChatHistory}\n\n${newInteraction}`;
};

/**
 * Create a prompt template for generating an answer based on context and
 * a question.
 *
 * Chat history will be an empty string if it's the first question.
 *
 * inputVariables: ["chatHistory", "context", "question"]
 */
const questionPrompt = PromptTemplate.fromTemplate(
  `Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
  ----------------
  CONTEXT: {context}
  ----------------
  CHAT HISTORY: {chatHistory}
  ----------------
  QUESTION: {question}
  ----------------
  Helpful Answer:`
);

const chain = RunnableSequence.from([
  {
    question: (input: { question: string; chatHistory?: string }) =>
      input.question,
    chatHistory: (input: { question: string; chatHistory?: string }) =>
      input.chatHistory ?? "",
    context: async (input: { question: string; chatHistory?: string }) => {
      const relevantDocs = await retriever.getRelevantDocuments(input.question);
      const serialized = formatDocumentsAsString(relevantDocs);
      return serialized;
    },
  },
  questionPrompt,
  model,
  new StringOutputParser(),
]);

const questionOne = "What did the president say about Justice Breyer?";

const resultOne = await chain.invoke({
  question: questionOne,
});

console.log({ resultOne });
/**
 * {
 *   resultOne: 'The president thanked Justice Breyer for his service and described him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.'
 * }
 */

const resultTwo = await chain.invoke({
  chatHistory: formatChatHistory(resultOne, questionOne),
  question: "Was it nice?",
});

console.log({ resultTwo });
/**
 * {
 *   resultTwo: "Yes, the president's description of Justice Breyer was positive."
 * }
 */

API Reference:

ChatOpenAI from @langchain/openai
OpenAIEmbeddings from @langchain/openai
HNSWLib from @langchain/community/vectorstores/hnswlib
RecursiveCharacterTextSplitter from langchain/text_splitter
formatDocumentsAsString from langchain/util/document
PromptTemplate from @langchain/core/prompts
RunnableSequence from @langchain/core/runnables
StringOutputParser from @langchain/core/output_parsers

Here's an explanation of each step in the RunnableSequence.from() call above:

The first input passed is an object containing a question key. This key is used as the main input for whatever question a user may ask.
The next key is chatHistory. This is a string of all previous chats (human & AI) concatenated together. This is used to help the model understand the context of the question.
The context key is used to fetch relevant documents from the loaded context (in this case the State Of The Union speech). It performs a call to the getRelevantDocuments method on the retriever, passing in the user's question as the query. We then pass it to our formatDocumentsAsString util which maps over all returned documents, joins them with newlines and returns a string.

After getting and formatting all inputs we pipe them through the following operations:

questionPrompt - this is the prompt template which we pass to the model in the next step. Behind the scenes it's taking the inputs outlined above and formatting them into the proper spots outlined in our template.
The formatted prompt with context then gets passed to the LLM and a response is generated.
Finally, we pipe the result of the LLM call to an output parser which formats the response into a readable string.

Using this RunnableSequence we can pass questions, and chat history to the model for informed conversational question answering.

Built-in Memory

Here's a customization example using a faster LLM to generate questions and a slower, more comprehensive LLM for the final answer. It uses a built-in memory object and returns the referenced source documents. Because we have returnSourceDocuments set and are thus returning multiple values from the chain, we must set inputKey and outputKey on the memory instance to let it know which values to store.

import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { LLMChain } from "langchain/chains";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { BufferMemory } from "langchain/memory";
import * as fs from "fs";
import { formatDocumentsAsString } from "langchain/util/document";
import { Document } from "@langchain/core/documents";
import { PromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { BaseMessage } from "@langchain/core/messages";

const text = fs.readFileSync("state_of_the_union.txt", "utf8");

const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);

const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
const retriever = vectorStore.asRetriever();

const memory = new BufferMemory({
  memoryKey: "chatHistory",
  inputKey: "question", // The key for the input to the chain
  outputKey: "text", // The key for the final conversational output of the chain
  returnMessages: true, // If using with a chat model (e.g. gpt-3.5 or gpt-4)
});

const serializeChatHistory = (chatHistory: Array<BaseMessage>): string =>
  chatHistory
    .map((chatMessage) => {
      if (chatMessage._getType() === "human") {
        return `Human: ${chatMessage.content}`;
      } else if (chatMessage._getType() === "ai") {
        return `Assistant: ${chatMessage.content}`;
      } else {
        return `${chatMessage.content}`;
      }
    })
    .join("\n");

/**
 * Create two prompt templates, one for answering questions, and one for
 * generating questions.
 */
const questionPrompt = PromptTemplate.fromTemplate(
  `Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------
CONTEXT: {context}
----------
CHAT HISTORY: {chatHistory}
----------
QUESTION: {question}
----------
Helpful Answer:`
);
const questionGeneratorTemplate = PromptTemplate.fromTemplate(
  `Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
----------
CHAT HISTORY: {chatHistory}
----------
FOLLOWUP QUESTION: {question}
----------
Standalone question:`
);

// Initialize fast and slow LLMs, along with chains for each
const fasterModel = new ChatOpenAI({
  model: "gpt-3.5-turbo",
});
const fasterChain = new LLMChain({
  llm: fasterModel,
  prompt: questionGeneratorTemplate,
});

const slowerModel = new ChatOpenAI({
  model: "gpt-4",
});
const slowerChain = new LLMChain({
  llm: slowerModel,
  prompt: questionPrompt,
});

const performQuestionAnswering = async (input: {
  question: string;
  chatHistory: Array<BaseMessage> | null;
  context: Array<Document>;
}): Promise<{ result: string; sourceDocuments: Array<Document> }> => {
  let newQuestion = input.question;
  // Serialize context and chat history into strings
  const serializedDocs = formatDocumentsAsString(input.context);
  const chatHistoryString = input.chatHistory
    ? serializeChatHistory(input.chatHistory)
    : null;

  if (chatHistoryString) {
    // Call the faster chain to generate a new question
    const { text } = await fasterChain.invoke({
      chatHistory: chatHistoryString,
      context: serializedDocs,
      question: input.question,
    });

    newQuestion = text;
  }

  const response = await slowerChain.invoke({
    chatHistory: chatHistoryString ?? "",
    context: serializedDocs,
    question: newQuestion,
  });

  // Save the chat history to memory
  await memory.saveContext(
    {
      question: input.question,
    },
    {
      text: response.text,
    }
  );

  return {
    result: response.text,
    sourceDocuments: input.context,
  };
};

const chain = RunnableSequence.from([
  {
    // Pipe the question through unchanged
    question: (input: { question: string }) => input.question,
    // Fetch the chat history, and return the history or null if not present
    chatHistory: async () => {
      const savedMemory = await memory.loadMemoryVariables({});
      const hasHistory = savedMemory.chatHistory.length > 0;
      return hasHistory ? savedMemory.chatHistory : null;
    },
    // Fetch relevant context based on the question
    context: async (input: { question: string }) =>
      retriever.getRelevantDocuments(input.question),
  },
  performQuestionAnswering,
]);

const resultOne = await chain.invoke({
  question: "What did the president say about Justice Breyer?",
});
console.log({ resultOne });
/**
 * {
 *   resultOne: {
 *     result: "The president thanked Justice Breyer for his service and described him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.",
 *     sourceDocuments: [...]
 *   }
 * }
 */

const resultTwo = await chain.invoke({
  question: "Was he nice?",
});
console.log({ resultTwo });
/**
 * {
 *   resultTwo: {
 *     result: "Yes, the president's description of Justice Breyer was positive."
 *     sourceDocuments: [...]
 *   }
 * }
 */

API Reference:

ChatOpenAI from @langchain/openai
OpenAIEmbeddings from @langchain/openai
LLMChain from langchain/chains
HNSWLib from @langchain/community/vectorstores/hnswlib
RecursiveCharacterTextSplitter from langchain/text_splitter
BufferMemory from langchain/memory
formatDocumentsAsString from langchain/util/document
Document from @langchain/core/documents
PromptTemplate from @langchain/core/prompts
RunnableSequence from @langchain/core/runnables
BaseMessage from @langchain/core/messages

Streaming

You can also stream results from the chain. This is useful if you want to stream the output of the chain to a client, or if you want to stream the output of the chain to another chain.

import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import * as fs from "fs";
import { formatDocumentsAsString } from "langchain/util/document";
import { PromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { RunnableSequence } from "@langchain/core/runnables";

/* Initialize the LLM & set streaming to true */
const model = new ChatOpenAI({
  streaming: true,
});
/* Load in the file we want to do question answering over */
const text = fs.readFileSync("state_of_the_union.txt", "utf8");
/* Split the text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);
/* Create the vectorstore */
const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
const retriever = vectorStore.asRetriever();

/**
 * Create a prompt template for generating an answer based on context and
 * a question.
 *
 * Chat history will be an empty string if it's the first question.
 *
 * inputVariables: ["chatHistory", "context", "question"]
 */
const questionPrompt = PromptTemplate.fromTemplate(
  `Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------
CONTEXT: {context}
----------
CHAT HISTORY: {chatHistory}
----------
QUESTION: {question}
----------
Helpful Answer:`
);

const chain = RunnableSequence.from([
  {
    question: (input: { question: string; chatHistory?: string }) =>
      input.question,
    chatHistory: (input: { question: string; chatHistory?: string }) =>
      input.chatHistory ?? "",
    context: async (input: { question: string; chatHistory?: string }) => {
      const relevantDocs = await retriever.getRelevantDocuments(input.question);
      const serialized = formatDocumentsAsString(relevantDocs);
      return serialized;
    },
  },
  questionPrompt,
  model,
  new StringOutputParser(),
]);

const stream = await chain.stream({
  question: "What did the president say about Justice Breyer?",
});

let streamedResult = "";
for await (const chunk of stream) {
  streamedResult += chunk;
  console.log(streamedResult);
}
/**
 * The
 * The president
 * The president honored
 * The president honored Justice
 * The president honored Justice Stephen
 * The president honored Justice Stephen B
 * The president honored Justice Stephen Brey
 * The president honored Justice Stephen Breyer
 * The president honored Justice Stephen Breyer,
 * The president honored Justice Stephen Breyer, a
 * The president honored Justice Stephen Breyer, a retiring
 * The president honored Justice Stephen Breyer, a retiring Justice
 * The president honored Justice Stephen Breyer, a retiring Justice of
 * The president honored Justice Stephen Breyer, a retiring Justice of the
 * The president honored Justice Stephen Breyer, a retiring Justice of the United
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States Supreme
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States Supreme Court
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States Supreme Court,
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States Supreme Court, for
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States Supreme Court, for his
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States Supreme Court, for his service
 * The president honored Justice Stephen Breyer, a retiring Justice of the United States Supreme Court, for his service.
 */

API Reference:

ChatOpenAI from @langchain/openai
OpenAIEmbeddings from @langchain/openai
HNSWLib from @langchain/community/vectorstores/hnswlib
RecursiveCharacterTextSplitter from langchain/text_splitter
formatDocumentsAsString from langchain/util/document
PromptTemplate from @langchain/core/prompts
StringOutputParser from @langchain/core/output_parsers
RunnableSequence from @langchain/core/runnables

Conversational Retrieval QA

API Reference:

Built-in Memory​

API Reference:

Streaming​

API Reference:

Help us out by providing feedback on this documentation page:

Built-in Memory

Streaming