Migrating off ConversationTokenBufferMemory
Follow this guide if you’re trying to migrate off one of the old memory classes listed below:
Memory Type | Description |
---|---|
ConversationTokenBufferMemory | Keeps only the most recent messages in the conversation under the constraint that the total number of tokens in the conversation does not exceed a certain limit. |
ConversationTokenBufferMemory
applies additional processing on top of
the raw conversation history to trim the conversation history to a size
that fits inside the context window of a chat model.
This processing functionality can be accomplished using LangChain’s built-in trimMessages function.
We’ll begin by exploring a straightforward method that involves applying processing logic to the entire conversation history.
While this approach is easy to implement, it has a downside: as the conversation grows, so does the latency, since the logic is re-applied to all previous exchanges in the conversation at each turn.
More advanced strategies focus on incrementally updating the conversation history to avoid redundant processing.
For instance, the LangGraph how-to guide on summarization demonstrates how to maintain a running summary of the conversation while discarding older messages, ensuring they aren't re-processed during later turns.
Set up
Dependencies
- npm
- yarn
- pnpm
npm i @langchain/openai @langchain/core zod
yarn add @langchain/openai @langchain/core zod
pnpm add @langchain/openai @langchain/core zod
Environment variables
process.env.OPENAI_API_KEY = "YOUR_OPENAI_API_KEY";
Details
Reimplementing ConversationTokenBufferMemory logic
Here, we’ll use trimMessages
to keeps the system message and the most
recent messages in the conversation under the constraint that the total
number of tokens in the conversation does not exceed a certain limit.
import {
AIMessage,
HumanMessage,
SystemMessage,
} from "@langchain/core/messages";
const messages = [
new SystemMessage("you're a good assistant, you always respond with a joke."),
new HumanMessage("i wonder why it's called langchain"),
new AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
new HumanMessage("and who is harrison chasing anyways"),
new AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
new HumanMessage("why is 42 always the answer?"),
new AIMessage(
"Because it's the only number that's constantly right, even when it doesn't add up!"
),
new HumanMessage("What did the cow say?"),
];
import { trimMessages } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
const selectedMessages = await trimMessages(messages, {
// Please see API reference for trimMessages for other ways to specify a token counter.
tokenCounter: new ChatOpenAI({ model: "gpt-4o" }),
maxTokens: 80, // <-- token limit
// The startOn is specified
// to make sure we do not generate a sequence where
// a ToolMessage that contains the result of a tool invocation
// appears before the AIMessage that requested a tool invocation
// as this will cause some chat models to raise an error.
startOn: "human",
strategy: "last",
includeSystem: true, // <-- Keep the system message
});
for (const msg of selectedMessages) {
console.log(msg);
}
SystemMessage {
"content": "you're a good assistant, you always respond with a joke.",
"additional_kwargs": {},
"response_metadata": {}
}
HumanMessage {
"content": "and who is harrison chasing anyways",
"additional_kwargs": {},
"response_metadata": {}
}
AIMessage {
"content": "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!",
"additional_kwargs": {},
"response_metadata": {},
"tool_calls": [],
"invalid_tool_calls": []
}
HumanMessage {
"content": "why is 42 always the answer?",
"additional_kwargs": {},
"response_metadata": {}
}
AIMessage {
"content": "Because it's the only number that's constantly right, even when it doesn't add up!",
"additional_kwargs": {},
"response_metadata": {},
"tool_calls": [],
"invalid_tool_calls": []
}
HumanMessage {
"content": "What did the cow say?",
"additional_kwargs": {},
"response_metadata": {}
}
Modern usage with LangGraph
The example below shows how to use LangGraph to add simple conversation pre-processing logic.
If you want to avoid running the computation on the entire conversation history each time, you can follow the how-to guide on summarization that demonstrates how to discard older messages, ensuring they aren't re-processed during later turns.
Details
import { v4 as uuidv4 } from "uuid";
import { ChatOpenAI } from "@langchain/openai";
import {
StateGraph,
MessagesAnnotation,
END,
START,
MemorySaver,
} from "@langchain/langgraph";
import { trimMessages } from "@langchain/core/messages";
// Define a chat model
const model = new ChatOpenAI({ model: "gpt-4o" });
// Define the function that calls the model
const callModel = async (
state: typeof MessagesAnnotation.State
): Promise<Partial<typeof MessagesAnnotation.State>> => {
const selectedMessages = await trimMessages(state.messages, {
tokenCounter: (messages) => messages.length, // Simple message count instead of token count
maxTokens: 5, // Allow up to 5 messages
strategy: "last",
startOn: "human",
includeSystem: true,
allowPartial: false,
});
const response = await model.invoke(selectedMessages);
// With LangGraph, we're able to return a single message, and LangGraph will concatenate
// it to the existing list
return { messages: [response] };
};
// Define a new graph
const workflow = new StateGraph(MessagesAnnotation)
// Define the two nodes we will cycle between
.addNode("model", callModel)
.addEdge(START, "model")
.addEdge("model", END);
const app = workflow.compile({
// Adding memory is straightforward in LangGraph!
// Just pass a checkpointer to the compile method.
checkpointer: new MemorySaver(),
});
// The thread id is a unique key that identifies this particular conversation
// ---
// NOTE: this must be `thread_id` and not `threadId` as the LangGraph internals expect `thread_id`
// ---
const thread_id = uuidv4();
const config = { configurable: { thread_id }, streamMode: "values" as const };
const inputMessage = {
role: "user",
content: "hi! I'm bob",
};
for await (const event of await app.stream(
{ messages: [inputMessage] },
config
)) {
const lastMessage = event.messages[event.messages.length - 1];
console.log(lastMessage.content);
}
// Here, let's confirm that the AI remembers our name!
const followUpMessage = {
role: "user",
content: "what was my name?",
};
// ---
// NOTE: You must pass the same thread id to continue the conversation
// we do that here by passing the same `config` object to the `.stream` call.
// ---
for await (const event of await app.stream(
{ messages: [followUpMessage] },
config
)) {
const lastMessage = event.messages[event.messages.length - 1];
console.log(lastMessage.content);
}
hi! I'm bob
Hello, Bob! How can I assist you today?
what was my name?
You mentioned that your name is Bob. How can I help you today?
Usage with a pre-built langgraph agent
This example shows usage of an Agent Executor with a pre-built agent constructed using the createReactAgent function.
If you are using one of the old LangChain pre-built agents, you should be able to replace that code with the new LangGraph pre-built agent which leverages native tool calling capabilities of chat models and will likely work better out of the box.
Details
import { z } from "zod";
import { v4 as uuidv4 } from "uuid";
import { BaseMessage, trimMessages } from "@langchain/core/messages";
import { tool } from "@langchain/core/tools";
import { ChatOpenAI } from "@langchain/openai";
import { MemorySaver } from "@langchain/langgraph";
import { createReactAgent } from "@langchain/langgraph/prebuilt";
const getUserAge = tool(
(name: string): string => {
// This is a placeholder for the actual implementation
if (name.toLowerCase().includes("bob")) {
return "42 years old";
}
return "41 years old";
},
{
name: "get_user_age",
description: "Use this tool to find the user's age.",
schema: z.string().describe("the name of the user"),
}
);
const memory = new MemorySaver();
const model2 = new ChatOpenAI({ model: "gpt-4o" });
const stateModifier = async (
messages: BaseMessage[]
): Promise<BaseMessage[]> => {
// We're using the message processor defined above.
return trimMessages(messages, {
tokenCounter: (msgs) => msgs.length, // <-- .length will simply count the number of messages rather than tokens
maxTokens: 5, // <-- allow up to 5 messages.
strategy: "last",
// The startOn is specified
// to make sure we do not generate a sequence where
// a ToolMessage that contains the result of a tool invocation
// appears before the AIMessage that requested a tool invocation
// as this will cause some chat models to raise an error.
startOn: "human",
includeSystem: true, // <-- Keep the system message
allowPartial: false,
});
};
const app2 = createReactAgent({
llm: model2,
tools: [getUserAge],
checkpointSaver: memory,
messageModifier: stateModifier,
});
// The thread id is a unique key that identifies
// this particular conversation.
// We'll just generate a random uuid here.
const threadId2 = uuidv4();
const config2 = {
configurable: { thread_id: threadId2 },
streamMode: "values" as const,
};
// Tell the AI that our name is Bob, and ask it to use a tool to confirm
// that it's capable of working like an agent.
const inputMessage2 = {
role: "user",
content: "hi! I'm bob. What is my age?",
};
for await (const event of await app2.stream(
{ messages: [inputMessage2] },
config2
)) {
const lastMessage = event.messages[event.messages.length - 1];
console.log(lastMessage.content);
}
// Confirm that the chat bot has access to previous conversation
// and can respond to the user saying that the user's name is Bob.
const followUpMessage2 = {
role: "user",
content: "do you remember my name?",
};
for await (const event of await app2.stream(
{ messages: [followUpMessage2] },
config2
)) {
const lastMessage = event.messages[event.messages.length - 1];
console.log(lastMessage.content);
}
hi! I'm bob. What is my age?
42 years old
Hi Bob! You are 42 years old.
do you remember my name?
Yes, your name is Bob! If there's anything else you'd like to know or discuss, feel free to ask.
LCEL: Add a preprocessing step
The simplest way to add complex conversation management is by introducing a pre-processing step in front of the chat model and pass the full conversation history to the pre-processing step.
This approach is conceptually simple and will work in many situations; for example, if using a RunnableWithMessageHistory instead of wrapping the chat model, wrap the chat model with the pre-processor.
The obvious downside of this approach is that latency starts to increase as the conversation history grows because of two reasons:
- As the conversation gets longer, more data may need to be fetched from whatever store your’e using to store the conversation history (if not storing it in memory).
- The pre-processing logic will end up doing a lot of redundant computation, repeating computation from previous steps of the conversation.
If you want to use a chat model's tool calling capabilities, remember to bind the tools to the model before adding the history pre-processing step to it!
Details
import { ChatOpenAI } from "@langchain/openai";
import {
AIMessage,
HumanMessage,
SystemMessage,
BaseMessage,
trimMessages,
} from "@langchain/core/messages";
import { tool } from "@langchain/core/tools";
import { z } from "zod";
const model3 = new ChatOpenAI({ model: "gpt-4o" });
const whatDidTheCowSay = tool(
(): string => {
return "foo";
},
{
name: "what_did_the_cow_say",
description: "Check to see what the cow said.",
schema: z.object({}),
}
);
const messageProcessor = trimMessages({
tokenCounter: (msgs) => msgs.length, // <-- .length will simply count the number of messages rather than tokens
maxTokens: 5, // <-- allow up to 5 messages.
strategy: "last",
// The startOn is specified
// to make sure we do not generate a sequence where
// a ToolMessage that contains the result of a tool invocation
// appears before the AIMessage that requested a tool invocation
// as this will cause some chat models to raise an error.
startOn: "human",
includeSystem: true, // <-- Keep the system message
allowPartial: false,
});
// Note that we bind tools to the model first!
const modelWithTools = model3.bindTools([whatDidTheCowSay]);
const modelWithPreprocessor = messageProcessor.pipe(modelWithTools);
const fullHistory = [
new SystemMessage("you're a good assistant, you always respond with a joke."),
new HumanMessage("i wonder why it's called langchain"),
new AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
new HumanMessage("and who is harrison chasing anyways"),
new AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
new HumanMessage("why is 42 always the answer?"),
new AIMessage(
"Because it's the only number that's constantly right, even when it doesn't add up!"
),
new HumanMessage("What did the cow say?"),
];
// We pass it explicitly to the modelWithPreprocessor for illustrative purposes.
// If you're using `RunnableWithMessageHistory` the history will be automatically
// read from the source that you configure.
const result = await modelWithPreprocessor.invoke(fullHistory);
console.log(result);
AIMessage {
"id": "chatcmpl-AB6uzWscxviYlbADFeDlnwIH82Fzt",
"content": "",
"additional_kwargs": {
"tool_calls": [
{
"id": "call_TghBL9dzqXFMCt0zj0VYMjfp",
"type": "function",
"function": "[Object]"
}
]
},
"response_metadata": {
"tokenUsage": {
"completionTokens": 16,
"promptTokens": 95,
"totalTokens": 111
},
"finish_reason": "tool_calls",
"system_fingerprint": "fp_a5d11b2ef2"
},
"tool_calls": [
{
"name": "what_did_the_cow_say",
"args": {},
"type": "tool_call",
"id": "call_TghBL9dzqXFMCt0zj0VYMjfp"
}
],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 95,
"output_tokens": 16,
"total_tokens": 111
}
}
If you need to implement more efficient logic and want to use
RunnableWithMessageHistory
for now the way to achieve this is to
subclass from
BaseChatMessageHistory
and define appropriate logic for addMessages
(that doesn’t simply
append the history, but instead re-writes it).
Unless you have a good reason to implement this solution, you should instead use LangGraph.
Next steps
Explore persistence with LangGraph:
- LangGraph quickstart tutorial
- How to add persistence (“memory”) to your graph
- How to manage conversation history
- How to add summary of the conversation history
Add persistence with simple LCEL (favor LangGraph for more complex use cases):
Working with message history: