How to stream responses from an LLM
All LLM
s implement the Runnable interface, which comes with default implementations of standard runnable methods (i.e. ainvoke
, batch
, abatch
, stream
, astream
, astream_events
).
The default streaming implementations provide an AsyncGenerator
that yields a single value: the final output from the underlying chat model provider.
The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.
See which integrations support token-by-token streaming here.
:::{.callout-note}
The default implementation does not provide support for token-by-token streaming, but it ensures that the model can be swapped in for any other model as it supports the same standard interface.
:::
Using .stream()
β
The easiest way to stream is to use the .stream()
method. This returns an readable stream that you can also iterate over:
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/core
yarn add @langchain/openai @langchain/core
pnpm add @langchain/openai @langchain/core
import { OpenAI } from "@langchain/openai";
const model = new OpenAI({
maxTokens: 25,
});
const stream = await model.stream("Tell me a joke.");
for await (const chunk of stream) {
console.log(chunk);
}
/*
Q
:
What
did
the
fish
say
when
it
hit
the
wall
?
A
:
Dam
!
*/
API Reference:
- OpenAI from
@langchain/openai
For models that do not support streaming, the entire response will be returned as a single chunk.
Using a callback handlerβ
You can also use a CallbackHandler
like so:
import { OpenAI } from "@langchain/openai";
// To enable streaming, we pass in `streaming: true` to the LLM constructor.
// Additionally, we pass in a handler for the `handleLLMNewToken` event.
const model = new OpenAI({
maxTokens: 25,
streaming: true,
});
const response = await model.invoke("Tell me a joke.", {
callbacks: [
{
handleLLMNewToken(token: string) {
console.log({ token });
},
},
],
});
console.log(response);
/*
{ token: '\n' }
{ token: '\n' }
{ token: 'Q' }
{ token: ':' }
{ token: ' Why' }
{ token: ' did' }
{ token: ' the' }
{ token: ' chicken' }
{ token: ' cross' }
{ token: ' the' }
{ token: ' playground' }
{ token: '?' }
{ token: '\n' }
{ token: 'A' }
{ token: ':' }
{ token: ' To' }
{ token: ' get' }
{ token: ' to' }
{ token: ' the' }
{ token: ' other' }
{ token: ' slide' }
{ token: '.' }
Q: Why did the chicken cross the playground?
A: To get to the other slide.
*/
API Reference:
- OpenAI from
@langchain/openai
We still have access to the end LLMResult
if using generate
. However, tokenUsage
may not be currently supported for all model providers when streaming.