Skip to main content

Streaming

Some LLMs provide a streaming response. This means that instead of waiting for the entire response to be returned, you can start processing it as soon as it's available. This is useful if you want to display the response to the user as it's being generated, or if you want to process the response as it's being generated.

Using .stream()

The easiest way to stream is to use the .stream() method. This returns an readable stream that you can also iterate over:

npm install @langchain/openai
import { OpenAI } from "@langchain/openai";

const model = new OpenAI({
maxTokens: 25,
});

const stream = await model.stream("Tell me a joke.");

for await (const chunk of stream) {
console.log(chunk);
}

/*


Q
:
What
did
the
fish
say
when
it
hit
the
wall
?


A
:
Dam
!
*/

API Reference:

For models that do not support streaming, the entire response will be returned as a single chunk.

Using a callback handler

You can also use a CallbackHandler like so:

import { OpenAI } from "@langchain/openai";

// To enable streaming, we pass in `streaming: true` to the LLM constructor.
// Additionally, we pass in a handler for the `handleLLMNewToken` event.
const model = new OpenAI({
maxTokens: 25,
streaming: true,
});

const response = await model.invoke("Tell me a joke.", {
callbacks: [
{
handleLLMNewToken(token: string) {
console.log({ token });
},
},
],
});
console.log(response);
/*
{ token: '\n' }
{ token: '\n' }
{ token: 'Q' }
{ token: ':' }
{ token: ' Why' }
{ token: ' did' }
{ token: ' the' }
{ token: ' chicken' }
{ token: ' cross' }
{ token: ' the' }
{ token: ' playground' }
{ token: '?' }
{ token: '\n' }
{ token: 'A' }
{ token: ':' }
{ token: ' To' }
{ token: ' get' }
{ token: ' to' }
{ token: ' the' }
{ token: ' other' }
{ token: ' slide' }
{ token: '.' }


Q: Why did the chicken cross the playground?
A: To get to the other slide.
*/

API Reference:

We still have access to the end LLMResult if using generate. However, token_usage is not currently supported for streaming.


Help us out by providing feedback on this documentation page: