Skip to main content

Conversational RAG

Prerequisites

This guide assumes familiarity with the following concepts:

In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of “memory” of past questions and answers, and some logic for incorporating those into its current thinking.

In this guide we focus on adding logic for incorporating historical messages. Further details on chat history management is covered here.

We will cover two approaches:

  1. Chains, in which we always execute a retrieval step;
  2. Agents, in which we give an LLM discretion over whether and how to execute a retrieval step (or multiple steps).

For the external knowledge source, we will use the same LLM Powered Autonomous Agents blog post by Lilian Weng from the RAG tutorial.

Setup​

Dependencies​

We’ll use an OpenAI chat model and embeddings and a Memory vector store in this walkthrough, but everything shown here works with any ChatModel or LLM, Embeddings, and VectorStore or Retriever.

We’ll use the following packages:

npm install --save langchain @langchain/openai langchain cheerio

We need to set environment variable OPENAI_API_KEY:

export OPENAI_API_KEY=YOUR_KEY

LangSmith​

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with LangSmith.

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=YOUR_KEY

# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true

Chains​

Let’s first revisit the Q&A app we built over the LLM Powered Autonomous Agents blog post by Lilian Weng in the RAG tutorial.

Pick your chat model:

Install dependencies

yarn add @langchain/openai 

Add environment variables

OPENAI_API_KEY=your-api-key

Instantiate the model

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0
});
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";

// 1. Load, chunk and index the contents of the blog to create a retriever.
const loader = new CheerioWebBaseLoader(
"https://lilianweng.github.io/posts/2023-06-23-agent/",
{
selector: ".post-content, .post-title, .post-header",
}
);
const docs = await loader.load();

const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const splits = await textSplitter.splitDocuments(docs);
const vectorstore = await MemoryVectorStore.fromDocuments(
splits,
new OpenAIEmbeddings()
);
const retriever = vectorstore.asRetriever();

// 2. Incorporate the retriever into a question-answering chain.
const systemPrompt =
"You are an assistant for question-answering tasks. " +
"Use the following pieces of retrieved context to answer " +
"the question. If you don't know the answer, say that you " +
"don't know. Use three sentences maximum and keep the " +
"answer concise." +
"\n\n" +
"{context}";

const prompt = ChatPromptTemplate.fromMessages([
["system", systemPrompt],
["human", "{input}"],
]);

const questionAnswerChain = await createStuffDocumentsChain({
llm,
prompt,
});

const ragChain = await createRetrievalChain({
retriever,
combineDocsChain: questionAnswerChain,
});
const response = await ragChain.invoke({
input: "What is Task Decomposition?",
});
console.log(response.answer);
Task decomposition involves breaking down large and complex tasks into smaller, more manageable subgoals or steps. This approach helps agents or models efficiently handle intricate tasks by simplifying them into easier components. Task decomposition can be achieved through techniques like Chain of Thought, Tree of Thoughts, or by using task-specific instructions and human input.

Note that we have used the built-in chain constructors createStuffDocumentsChain and createRetrievalChain, so that the basic ingredients to our solution are:

  1. retriever;
  2. prompt;
  3. LLM.

This will simplify the process of incorporating chat history.

Adding chat history​

The chain we have built uses the input query directly to retrieve relevant context. But in a conversational setting, the user query might require conversational context to be understood. For example, consider this exchange:

Human: “What is Task Decomposition?”

AI: “Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model.”

Human: “What are common ways of doing it?”

In order to answer the second question, our system needs to understand that “it” refers to “Task Decomposition.”

We’ll need to update two things about our existing app:

  1. Prompt: Update our prompt to support historical messages as an input.
  2. Contextualizing questions: Add a sub-chain that takes the latest user question and reformulates it in the context of the chat history. This can be thought of simply as building a new “history aware” retriever. Whereas before we had:
    • query -> retriever
      Now we will have:
    • (query, conversation history) -> LLM -> rephrased query -> retriever

Contextualizing the question​

First we’ll need to define a sub-chain that takes historical messages and the latest user question, and reformulates the question if it makes reference to any information in the historical information.

We’ll use a prompt that includes a MessagesPlaceholder variable under the name “chat_history”. This allows us to pass in a list of Messages to the prompt using the “chat_history” input key, and these messages will be inserted after the system message and before the human message containing the latest question.

Note that we leverage a helper function createHistoryAwareRetriever for this step, which manages the case where chat_history is empty, and otherwise applies prompt.pipe(llm).pipe(new StringOutputParser()).pipe(retriever) in sequence.

createHistoryAwareRetriever constructs a chain that accepts keys input and chat_history as input, and has the same output schema as a retriever.

import { createHistoryAwareRetriever } from "langchain/chains/history_aware_retriever";
import { MessagesPlaceholder } from "@langchain/core/prompts";

const contextualizeQSystemPrompt =
"Given a chat history and the latest user question " +
"which might reference context in the chat history, " +
"formulate a standalone question which can be understood " +
"without the chat history. Do NOT answer the question, " +
"just reformulate it if needed and otherwise return it as is.";

const contextualizeQPrompt = ChatPromptTemplate.fromMessages([
["system", contextualizeQSystemPrompt],
new MessagesPlaceholder("chat_history"),
["human", "{input}"],
]);

const historyAwareRetriever = await createHistoryAwareRetriever({
llm,
retriever,
rephrasePrompt: contextualizeQPrompt,
});

This chain prepends a rephrasing of the input query to our retriever, so that the retrieval incorporates the context of the conversation.

Now we can build our full QA chain. This is as simple as updating the retriever to be our new historyAwareRetriever.

Again, we will use createStuffDocumentsChain to generate a questionAnswerChain2, with input keys context, chat_history, and input– it accepts the retrieved context alongside the conversation history and query to generate an answer. A more detailed explaination is over here

We build our final ragChain2 with createRetrievalChain. This chain applies the historyAwareRetriever and questionAnswerChain2 in sequence, retaining intermediate outputs such as the retrieved context for convenience. It has input keys input and chat_history, and includes input, chat_history, context, and answer in its output.

const qaPrompt = ChatPromptTemplate.fromMessages([
["system", systemPrompt],
new MessagesPlaceholder("chat_history"),
["human", "{input}"],
]);

const questionAnswerChain2 = await createStuffDocumentsChain({
llm,
prompt: qaPrompt,
});

const ragChain2 = await createRetrievalChain({
retriever: historyAwareRetriever,
combineDocsChain: questionAnswerChain2,
});

Let’s try this. Below we ask a question and a follow-up question that requires contextualization to return a sensible response. Because our chain includes a "chat_history" input, the caller needs to manage the chat history. We can achieve this by appending input and output messages to a list:

import { BaseMessage, HumanMessage, AIMessage } from "@langchain/core/messages";

let chatHistory: BaseMessage[] = [];

const question = "What is Task Decomposition?";
const aiMsg1 = await ragChain2.invoke({
input: question,
chat_history: chatHistory,
});
chatHistory = chatHistory.concat([
new HumanMessage(question),
new AIMessage(aiMsg1.answer),
]);

const secondQuestion = "What are common ways of doing it?";
const aiMsg2 = await ragChain2.invoke({
input: secondQuestion,
chat_history: chatHistory,
});

console.log(aiMsg2.answer);
Common ways of doing Task Decomposition include:
1. Using simple prompting with an LLM, such as asking it to outline steps or subgoals for a task.
2. Employing task-specific instructions, like "Write a story outline" for writing a novel.
3. Incorporating human inputs for guidance.
Additionally, advanced approaches like Chain of Thought (CoT) and Tree of Thoughts (ToT) can further refine the process, and using an external classical planner with PDDL (as in LLM+P) is another option.

Stateful management of chat history​

Here we’ve gone over how to add application logic for incorporating historical outputs, but we’re still manually updating the chat history and inserting it into each input. In a real Q&A application we’ll want some way of persisting chat history and some way of automatically inserting and updating it.

For this we can use:

For a detailed walkthrough of how to use these classes together to create a stateful conversational chain, head to the How to add message history (memory) LCEL page.

Instances of RunnableWithMessageHistory manage the chat history for you. They accept a config with a key ("sessionId" by default) that specifies what conversation history to fetch and prepend to the input, and append the output to the same conversation history. Below is an example:

import { RunnableWithMessageHistory } from "@langchain/core/runnables";
import { ChatMessageHistory } from "langchain/stores/message/in_memory";

const demoEphemeralChatMessageHistoryForChain = new ChatMessageHistory();

const conversationalRagChain = new RunnableWithMessageHistory({
runnable: ragChain2,
getMessageHistory: (_sessionId) => demoEphemeralChatMessageHistoryForChain,
inputMessagesKey: "input",
historyMessagesKey: "chat_history",
outputMessagesKey: "answer",
});
const result1 = await conversationalRagChain.invoke(
{ input: "What is Task Decomposition?" },
{ configurable: { sessionId: "abc123" } }
);
console.log(result1.answer);
Task Decomposition involves breaking down complicated tasks into smaller, more manageable subgoals. Techniques such as the Chain of Thought (CoT) and Tree of Thoughts extend this by decomposing problems into multiple thought steps and exploring multiple reasoning possibilities at each step. LLMs can perform task decomposition using simple prompts, task-specific instructions, or human inputs, and some approaches like LLM+P involve using external classical planners.
const result2 = await conversationalRagChain.invoke(
{ input: "What are common ways of doing it?" },
{ configurable: { sessionId: "abc123" } }
);
console.log(result2.answer);
Common ways of doing task decomposition include:

1. Using simple prompting with an LLM, such as "Steps for XYZ.\n1." or "What are the subgoals for achieving XYZ?"
2. Utilizing task-specific instructions, like "Write a story outline." for writing a novel.
3. Incorporating human inputs to guide and refine the decomposition process.

Additionally, the LLM+P approach utilizes an external classical planner, involving PDDL to describe and plan complex tasks.

Tying it together​

For convenience, we tie together all of the necessary steps in a single code cell:

import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from "@langchain/core/prompts";
import { createHistoryAwareRetriever } from "langchain/chains/history_aware_retriever";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { RunnableWithMessageHistory } from "@langchain/core/runnables";
import { ChatMessageHistory } from "langchain/stores/message/in_memory";
import { BaseChatMessageHistory } from "@langchain/core/chat_history";

const llm2 = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0 });

// Construct retriever
const loader2 = new CheerioWebBaseLoader(
"https://lilianweng.github.io/posts/2023-06-23-agent/",
{
selector: ".post-content, .post-title, .post-header",
}
);

const docs2 = await loader2.load();

const textSplitter2 = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const splits2 = await textSplitter2.splitDocuments(docs2);
const vectorstore2 = await MemoryVectorStore.fromDocuments(
splits2,
new OpenAIEmbeddings()
);
const retriever2 = vectorstore2.asRetriever();

// Contextualize question
const contextualizeQSystemPrompt2 =
"Given a chat history and the latest user question " +
"which might reference context in the chat history, " +
"formulate a standalone question which can be understood " +
"without the chat history. Do NOT answer the question, " +
"just reformulate it if needed and otherwise return it as is.";

const contextualizeQPrompt2 = ChatPromptTemplate.fromMessages([
["system", contextualizeQSystemPrompt2],
new MessagesPlaceholder("chat_history"),
["human", "{input}"],
]);

const historyAwareRetriever2 = await createHistoryAwareRetriever({
llm: llm2,
retriever: retriever2,
rephrasePrompt: contextualizeQPrompt2,
});

// Answer question
const systemPrompt2 =
"You are an assistant for question-answering tasks. " +
"Use the following pieces of retrieved context to answer " +
"the question. If you don't know the answer, say that you " +
"don't know. Use three sentences maximum and keep the " +
"answer concise." +
"\n\n" +
"{context}";

const qaPrompt2 = ChatPromptTemplate.fromMessages([
["system", systemPrompt2],
new MessagesPlaceholder("chat_history"),
["human", "{input}"],
]);

const questionAnswerChain3 = await createStuffDocumentsChain({
llm,
prompt: qaPrompt2,
});

const ragChain3 = await createRetrievalChain({
retriever: historyAwareRetriever2,
combineDocsChain: questionAnswerChain3,
});

// Statefully manage chat history
const store2: Record<string, BaseChatMessageHistory> = {};

function getSessionHistory2(sessionId: string): BaseChatMessageHistory {
if (!(sessionId in store2)) {
store2[sessionId] = new ChatMessageHistory();
}
return store2[sessionId];
}

const conversationalRagChain2 = new RunnableWithMessageHistory({
runnable: ragChain3,
getMessageHistory: getSessionHistory2,
inputMessagesKey: "input",
historyMessagesKey: "chat_history",
outputMessagesKey: "answer",
});

// Example usage
const query2 = "What is Task Decomposition?";

for await (const s of await conversationalRagChain2.stream(
{ input: query2 },
{ configurable: { sessionId: "unique_session_id" } }
)) {
console.log(s);
console.log("----");
}
{ input: 'What is Task Decomposition?' }
----
{ chat_history: [] }
----
{
context: [
Document {
pageContent: 'Fig. 1. Overview of a LLM-powered autonomous agent system.\n' +
'Component One: Planning#\n' +
'A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\n' +
'Task Decomposition#\n' +
'Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\n' +
'Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.',
metadata: [Object],
id: undefined
},
Document {
pageContent: 'Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n' +
'Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\n' +
'Self-Reflection#',
metadata: [Object],
id: undefined
},
Document {
pageContent: 'Planning\n' +
'\n' +
'Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\n' +
'Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n' +
'\n' +
'\n' +
'Memory\n' +
'\n' +
'Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\n' +
'Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\n' +
'\n' +
'\n' +
'Tool use\n' +
'\n' +
'The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.',
metadata: [Object],
id: undefined
},
Document {
pageContent: 'Resources:\n' +
'1. Internet access for searches and information gathering.\n' +
'2. Long Term memory management.\n' +
'3. GPT-3.5 powered Agents for delegation of simple tasks.\n' +
'4. File output.\n' +
'\n' +
'Performance Evaluation:\n' +
'1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n' +
'2. Constructively self-criticize your big-picture behavior constantly.\n' +
'3. Reflect on past decisions and strategies to refine your approach.\n' +
'4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.',
metadata: [Object],
id: undefined
}
]
}
----
{ answer: '' }
----
{ answer: 'Task' }
----
{ answer: ' decomposition' }
----
{ answer: ' involves' }
----
{ answer: ' breaking' }
----
{ answer: ' down' }
----
{ answer: ' a' }
----
{ answer: ' complex' }
----
{ answer: ' task' }
----
{ answer: ' into' }
----
{ answer: ' smaller' }
----
{ answer: ' and' }
----
{ answer: ' more' }
----
{ answer: ' manageable' }
----
{ answer: ' sub' }
----
{ answer: 'goals' }
----
{ answer: ' or' }
----
{ answer: ' steps' }
----
{ answer: '.' }
----
{ answer: ' This' }
----
{ answer: ' process' }
----
{ answer: ' allows' }
----
{ answer: ' an' }
----
{ answer: ' agent' }
----
{ answer: ' or' }
----
{ answer: ' model' }
----
{ answer: ' to' }
----
{ answer: ' efficiently' }
----
{ answer: ' handle' }
----
{ answer: ' intricate' }
----
{ answer: ' tasks' }
----
{ answer: ' by' }
----
{ answer: ' dividing' }
----
{ answer: ' them' }
----
{ answer: ' into' }
----
{ answer: ' simpler' }
----
{ answer: ' components' }
----
{ answer: '.' }
----
{ answer: ' Task' }
----
{ answer: ' decomposition' }
----
{ answer: ' can' }
----
{ answer: ' be' }
----
{ answer: ' achieved' }
----
{ answer: ' through' }
----
{ answer: ' techniques' }
----
{ answer: ' like' }
----
{ answer: ' Chain' }
----
{ answer: ' of' }
----
{ answer: ' Thought' }
----
{ answer: ',' }
----
{ answer: ' Tree' }
----
{ answer: ' of' }
----
{ answer: ' Thoughts' }
----
{ answer: ',' }
----
{ answer: ' or' }
----
{ answer: ' by' }
----
{ answer: ' using' }
----
{ answer: ' task' }
----
{ answer: '-specific' }
----
{ answer: ' instructions' }
----
{ answer: '.' }
----
{ answer: '' }
----
{ answer: '' }
----

Agents​

Agents leverage the reasoning capabilities of LLMs to make decisions during execution. Using agents allow you to offload some discretion over the retrieval process. Although their behavior is less predictable than chains, they offer some advantages in this context:

  • Agents generate the input to the retriever directly, without necessarily needing us to explicitly build in contextualization, as we did above;
  • Agents can execute multiple retrieval steps in service of a query, or refrain from executing a retrieval step altogether (e.g., in response to a generic greeting from a user).

Retrieval tool​

Agents can access “tools” and manage their execution. In this case, we will convert our retriever into a LangChain tool to be wielded by the agent:

import { createRetrieverTool } from "langchain/tools/retriever";

const tool = createRetrieverTool(retriever, {
name: "blog_post_retriever",
description:
"Searches and returns excerpts from the Autonomous Agents blog post.",
});
const tools = [tool];

Tools are LangChain Runnables, and implement the usual interface:

console.log(await tool.invoke({ query: "task decomposition" }));
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.
Self-Reflection#

Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.

(3) Task execution: Expert models execute on the specific tasks and log results.
Instruction:

With the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.

Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.

Performance Evaluation:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.

Agent constructor​

Now that we have defined the tools and the LLM, we can create the agent. We will be using LangGraph to construct the agent. Currently we are using a high level interface to construct the agent, but the nice thing about LangGraph is that this high-level interface is backed by a low-level, highly controllable API in case you want to modify the agent logic.

import { createReactAgent } from "@langchain/langgraph/prebuilt";

const agentExecutor = createReactAgent({ llm, tools });

We can now try it out. Note that so far it is not stateful (we still need to add in memory)

const query = "What is Task Decomposition?";

for await (const s of await agentExecutor.stream({
messages: [new HumanMessage(query)],
})) {
console.log(s);
console.log("----");
}
{
agent: {
messages: [
AIMessage {
"id": "chatcmpl-ABABtUmgD1ZlOHZd0nD9TR8yb3mMe",
"content": "",
"additional_kwargs": {
"tool_calls": [
{
"id": "call_dWxEY41mg9VSLamVYHltsUxL",
"type": "function",
"function": "[Object]"
}
]
},
"response_metadata": {
"tokenUsage": {
"completionTokens": 19,
"promptTokens": 66,
"totalTokens": 85
},
"finish_reason": "tool_calls",
"system_fingerprint": "fp_3537616b13"
},
"tool_calls": [
{
"name": "blog_post_retriever",
"args": {
"query": "Task Decomposition"
},
"type": "tool_call",
"id": "call_dWxEY41mg9VSLamVYHltsUxL"
}
],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 66,
"output_tokens": 19,
"total_tokens": 85
}
}
]
}
}
----
{
tools: {
messages: [
ToolMessage {
"content": "Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\n\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\n\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\n\n\nTool use\n\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.",
"name": "blog_post_retriever",
"additional_kwargs": {},
"response_metadata": {},
"tool_call_id": "call_dWxEY41mg9VSLamVYHltsUxL"
}
]
}
}
----
{
agent: {
messages: [
AIMessage {
"id": "chatcmpl-ABABuSj5FHmHFdeR2Pv7Cxcmq5aQz",
"content": "Task Decomposition is a technique that allows an agent to break down a complex task into smaller, more manageable subtasks or steps. The primary goal is to simplify the task to ensure efficient execution and better understanding. \n\n### Methods in Task Decomposition:\n1. **Chain of Thought (CoT)**:\n - **Description**: This technique involves instructing the model to “think step by step” to decompose hard tasks into smaller ones. It transforms large tasks into multiple manageable tasks, enhancing the model's performance and providing insight into its thinking process. \n - **Example**: When given a complex problem, the model outlines sequential steps to reach a solution.\n\n2. **Tree of Thoughts**:\n - **Description**: This extends CoT by exploring multiple reasoning possibilities at each step. The problem is decomposed into multiple thought steps, with several thoughts generated per step, forming a sort of decision tree.\n - **Example**: For a given task, the model might consider various alternative actions at each stage, evaluating each before proceeding.\n\n3. **LLM with Prompts**:\n - **Description**: Basic task decomposition can be done via simple prompts like \"Steps for XYZ\" or \"What are the subgoals for achieving XYZ?\" This can also be guided by task-specific instructions or human inputs when necessary.\n - **Example**: Asking the model to list the subgoals for writing a novel might produce an outline broken down into chapters, character development, and plot points.\n\n4. **LLM+P**:\n - **Description**: This approach involves outsourcing long-term planning to an external classical planner using Planning Domain Definition Language (PDDL). The task is translated into a PDDL problem by the model, planned using classical planning tools, and then translated back into natural language.\n - **Example**: In robotics, translating a task into PDDL and then using a domain-specific planner to generate a sequence of actions.\n\n### Applications:\n- **Planning**: Helps an agent plan tasks by breaking them into clear, manageable steps.\n- **Self-Reflection**: Allows agents to reflect and refine their actions, learning from past mistakes to improve future performance.\n- **Memory**: Utilizes short-term memory for immediate context and long-term memory for retaining and recalling information over extended periods.\n- **Tool Use**: Enables the agent to call external APIs for additional information or capabilities not inherent in the model.\n\nIn essence, task decomposition leverages various methodologies to simplify complex tasks, ensuring better performance, improved reasoning, and effective task execution.",
"additional_kwargs": {},
"response_metadata": {
"tokenUsage": {
"completionTokens": 522,
"promptTokens": 821,
"totalTokens": 1343
},
"finish_reason": "stop",
"system_fingerprint": "fp_e375328146"
},
"tool_calls": [],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 821,
"output_tokens": 522,
"total_tokens": 1343
}
}
]
}
}
----

LangGraph comes with built in persistence, so we don’t need to use ChatMessageHistory! Rather, we can pass in a checkpointer to our LangGraph agent directly

import { MemorySaver } from "@langchain/langgraph";

const memory = new MemorySaver();

const agentExecutorWithMemory = createReactAgent({
llm,
tools,
checkpointSaver: memory,
});

This is all we need to construct a conversational RAG agent.

Let’s observe its behavior. Note that if we input a query that does not require a retrieval step, the agent does not execute one:

const config = { configurable: { thread_id: "abc123" } };

for await (const s of await agentExecutorWithMemory.stream(
{ messages: [new HumanMessage("Hi! I'm bob")] },
config
)) {
console.log(s);
console.log("----");
}
{
agent: {
messages: [
AIMessage {
"id": "chatcmpl-ABACGc1vDPUSHYN7YVkuUMwpKR20P",
"content": "Hello, Bob! How can I assist you today?",
"additional_kwargs": {},
"response_metadata": {
"tokenUsage": {
"completionTokens": 12,
"promptTokens": 64,
"totalTokens": 76
},
"finish_reason": "stop",
"system_fingerprint": "fp_e375328146"
},
"tool_calls": [],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 64,
"output_tokens": 12,
"total_tokens": 76
}
}
]
}
}
----

Further, if we input a query that does require a retrieval step, the agent generates the input to the tool:

for await (const s of await agentExecutorWithMemory.stream(
{ messages: [new HumanMessage(query)] },
config
)) {
console.log(s);
console.log("----");
}
{
agent: {
messages: [
AIMessage {
"id": "chatcmpl-ABACI6WN7hkfJjFhIUBGt3TswtPOv",
"content": "",
"additional_kwargs": {
"tool_calls": [
{
"id": "call_Lys2G4TbOMJ6RBuVvKnFSK4V",
"type": "function",
"function": "[Object]"
}
]
},
"response_metadata": {
"tokenUsage": {
"completionTokens": 19,
"promptTokens": 89,
"totalTokens": 108
},
"finish_reason": "tool_calls",
"system_fingerprint": "fp_f82f5b050c"
},
"tool_calls": [
{
"name": "blog_post_retriever",
"args": {
"query": "Task Decomposition"
},
"type": "tool_call",
"id": "call_Lys2G4TbOMJ6RBuVvKnFSK4V"
}
],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 89,
"output_tokens": 19,
"total_tokens": 108
}
}
]
}
}
----
{
tools: {
messages: [
ToolMessage {
"content": "Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\n\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\n\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\n\n\nTool use\n\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.",
"name": "blog_post_retriever",
"additional_kwargs": {},
"response_metadata": {},
"tool_call_id": "call_Lys2G4TbOMJ6RBuVvKnFSK4V"
}
]
}
}
----
{
agent: {
messages: [
AIMessage {
"id": "chatcmpl-ABACJu56eYSAyyMNaV9UEUwHS8vRu",
"content": "Task Decomposition is a method used to break down complicated tasks into smaller, more manageable steps. This approach leverages the \"Chain of Thought\" (CoT) technique, which prompts models to \"think step by step\" to enhance performance on complex tasks. Here’s a summary of the key concepts related to Task Decomposition:\n\n1. **Chain of Thought (CoT):**\n - A prompting technique that encourages models to decompose hard tasks into simpler steps, transforming big tasks into multiple manageable sub-tasks.\n - CoT helps to provide insights into the model’s thinking process.\n\n2. **Tree of Thoughts:**\n - An extension of CoT, this approach explores multiple reasoning paths at each step.\n - It creates a tree structure by generating multiple thoughts per step, and uses search methods like breadth-first search (BFS) or depth-first search (DFS) to explore these thoughts.\n - Each state is evaluated by a classifier or majority vote.\n\n3. **Methods for Task Decomposition:**\n - Simple prompting such as instructing with phrases like \"Steps for XYZ: 1., 2., 3.\" or \"What are the subgoals for achieving XYZ?\".\n - Using task-specific instructions like \"Write a story outline\" for specific tasks such as writing a novel.\n - Incorporating human inputs for better granularity.\n\n4. **LLM+P (Long-horizon Planning):**\n - A method that involves using an external classical planner for long-horizon planning.\n - The process involves translating the problem into a Planning Domain Definition Language (PDDL) problem, using a classical planner to generate a PDDL plan, and then translating it back into natural language.\n\nTask Decomposition is essential in planning complex tasks, allowing for efficient handling by breaking them into sub-tasks and sub-goals. This process is integral to the functioning of autonomous agent systems and enhances their capability to execute intricate tasks effectively.",
"additional_kwargs": {},
"response_metadata": {
"tokenUsage": {
"completionTokens": 396,
"promptTokens": 844,
"totalTokens": 1240
},
"finish_reason": "stop",
"system_fingerprint": "fp_9f2bfdaa89"
},
"tool_calls": [],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 844,
"output_tokens": 396,
"total_tokens": 1240
}
}
]
}
}
----

Above, instead of inserting our query verbatim into the tool, the agent stripped unnecessary words like “what” and “is”.

This same principle allows the agent to use the context of the conversation when necessary:

const query3 =
"What according to the blog post are common ways of doing it? redo the search";

for await (const s of await agentExecutorWithMemory.stream(
{ messages: [new HumanMessage(query3)] },
config
)) {
console.log(s);
console.log("----");
}
{
agent: {
messages: [
AIMessage {
"id": "chatcmpl-ABACPZzSugzrREQRO4mVQfI3cQOeL",
"content": "",
"additional_kwargs": {
"tool_calls": [
{
"id": "call_5nSZb396Tcg73Pok6Bx1XV8b",
"type": "function",
"function": "[Object]"
}
]
},
"response_metadata": {
"tokenUsage": {
"completionTokens": 22,
"promptTokens": 1263,
"totalTokens": 1285
},
"finish_reason": "tool_calls",
"system_fingerprint": "fp_9f2bfdaa89"
},
"tool_calls": [
{
"name": "blog_post_retriever",
"args": {
"query": "common ways of doing task decomposition"
},
"type": "tool_call",
"id": "call_5nSZb396Tcg73Pok6Bx1XV8b"
}
],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 1263,
"output_tokens": 22,
"total_tokens": 1285
}
}
]
}
}
----
{
tools: {
messages: [
ToolMessage {
"content": "Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\n\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\n\nShort-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\nLong-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\n\n\nTool use\n\nThe agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\n\nResources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.",
"name": "blog_post_retriever",
"additional_kwargs": {},
"response_metadata": {},
"tool_call_id": "call_5nSZb396Tcg73Pok6Bx1XV8b"
}
]
}
}
----
{
agent: {
messages: [
AIMessage {
"id": "chatcmpl-ABACQt9pT5dKCTaGQpVawcmCCWdET",
"content": "According to the blog post, common ways of performing Task Decomposition include:\n\n1. **Using Large Language Models (LLMs) with Simple Prompting:**\n - Providing clear and structured prompts such as \"Steps for XYZ: 1., 2., 3.\" or asking \"What are the subgoals for achieving XYZ?\"\n - This allows the model to break down the tasks step-by-step.\n\n2. **Task-Specific Instructions:**\n - Employing specific instructions tailored to the task at hand, for example, \"Write a story outline\" for writing a novel.\n - These instructions guide the model in decomposing the task appropriately.\n\n3. **Involving Human Inputs:**\n - Integrating insights and directives from humans to aid in the decomposition process.\n - This can ensure that the breakdown is comprehensive and accurately reflects the nuances of the task.\n\n4. **LLM+P Approach for Long-Horizon Planning:**\n - Utilizing an external classical planner by translating the problem into Planning Domain Definition Language (PDDL).\n - The process involves:\n 1. Translating the problem into “Problem PDDL”.\n 2. Requesting a classical planner to generate a PDDL plan based on an existing “Domain PDDL”.\n 3. Translating the PDDL plan back into natural language.\n\nThese methods enable effective management and execution of complex tasks by transforming them into simpler, more manageable components.",
"additional_kwargs": {},
"response_metadata": {
"tokenUsage": {
"completionTokens": 292,
"promptTokens": 2010,
"totalTokens": 2302
},
"finish_reason": "stop",
"system_fingerprint": "fp_9f2bfdaa89"
},
"tool_calls": [],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 2010,
"output_tokens": 292,
"total_tokens": 2302
}
}
]
}
}
----

Note that the agent was able to infer that “it” in our query refers to “task decomposition”, and generated a reasonable search query as a result– in this case, “common ways of task decomposition”.

Tying it together​

For convenience, we tie together all of the necessary steps in a single code cell:

import { ChatOpenAI } from "@langchain/openai";
import { MemorySaver } from "@langchain/langgraph";
import { createReactAgent } from "@langchain/langgraph/prebuilt";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { createRetrieverTool } from "langchain/tools/retriever";

const memory3 = new MemorySaver();
const llm3 = new ChatOpenAI({ model: "gpt-4o", temperature: 0 });

// Construct retriever
const loader3 = new CheerioWebBaseLoader(
"https://lilianweng.github.io/posts/2023-06-23-agent/",
{
selector: ".post-content, .post-title, .post-header",
}
);

const docs3 = await loader3.load();

const textSplitter3 = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const splits3 = await textSplitter3.splitDocuments(docs3);
const vectorstore3 = await MemoryVectorStore.fromDocuments(
splits3,
new OpenAIEmbeddings()
);
const retriever3 = vectorstore3.asRetriever();

// Build retriever tool
const tool3 = createRetrieverTool(retriever3, {
name: "blog_post_retriever",
description:
"Searches and returns excerpts from the Autonomous Agents blog post.",
});
const tools3 = [tool3];

const agentExecutor3 = createReactAgent({
llm: llm3,
tools: tools3,
checkpointSaver: memory3,
});

Next steps​

We’ve covered the steps to build a basic conversational Q&A application:

  • We used chains to build a predictable application that generates search queries for each user input;
  • We used agents to build an application that “decides” when and how to generate search queries.

To explore different types of retrievers and retrieval strategies, visit the retrievers section of the how-to guides.

For a detailed walkthrough of LangChain’s conversation memory abstractions, visit the How to add message history (memory) LCEL page.


Was this page helpful?


You can also leave detailed feedback on GitHub.