Skip to main content

How to stream responses from an LLM

All LLMs implement the Runnable interface, which comes with default implementations of standard runnable methods (i.e. ainvoke, batch, abatch, stream, astream, astream_events).

The default streaming implementations provide an AsyncGenerator that yields a single value: the final output from the underlying chat model provider.

The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.

See which integrations support token-by-token streaming here.

:::{.callout-note}

The default implementation does not provide support for token-by-token streaming, but it ensures that the model can be swapped in for any other model as it supports the same standard interface.

:::

Using .stream()​

The easiest way to stream is to use the .stream() method. This returns an readable stream that you can also iterate over:

npm install @langchain/openai @langchain/core
import { OpenAI } from "@langchain/openai";

const model = new OpenAI({
maxTokens: 25,
});

const stream = await model.stream("Tell me a joke.");

for await (const chunk of stream) {
console.log(chunk);
}

/*


Q
:
What
did
the
fish
say
when
it
hit
the
wall
?


A
:
Dam
!
*/

API Reference:

For models that do not support streaming, the entire response will be returned as a single chunk.

Using a callback handler​

You can also use a CallbackHandler like so:

import { OpenAI } from "@langchain/openai";

// To enable streaming, we pass in `streaming: true` to the LLM constructor.
// Additionally, we pass in a handler for the `handleLLMNewToken` event.
const model = new OpenAI({
maxTokens: 25,
streaming: true,
});

const response = await model.invoke("Tell me a joke.", {
callbacks: [
{
handleLLMNewToken(token: string) {
console.log({ token });
},
},
],
});
console.log(response);
/*
{ token: '\n' }
{ token: '\n' }
{ token: 'Q' }
{ token: ':' }
{ token: ' Why' }
{ token: ' did' }
{ token: ' the' }
{ token: ' chicken' }
{ token: ' cross' }
{ token: ' the' }
{ token: ' playground' }
{ token: '?' }
{ token: '\n' }
{ token: 'A' }
{ token: ':' }
{ token: ' To' }
{ token: ' get' }
{ token: ' to' }
{ token: ' the' }
{ token: ' other' }
{ token: ' slide' }
{ token: '.' }


Q: Why did the chicken cross the playground?
A: To get to the other slide.
*/

API Reference:

We still have access to the end LLMResult if using generate. However, tokenUsage may not be currently supported for all model providers when streaming.


Was this page helpful?


You can also leave detailed feedback on GitHub.