Streaming

Some LLMs provide a streaming response. This means that instead of waiting for the entire response to be returned, you can start processing it as soon as it's available. This is useful if you want to display the response to the user as it's being generated, or if you want to process the response as it's being generated.

Using `.stream()`

The easiest way to stream is to use the .stream() method. This returns an readable stream that you can also iterate over:

tip

See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/openai

yarn add @langchain/openai

pnpm add @langchain/openai

import { OpenAI } from "@langchain/openai";

const model = new OpenAI({
  maxTokens: 25,
});

const stream = await model.stream("Tell me a joke.");

for await (const chunk of stream) {
  console.log(chunk);
}

/*


Q
:
 What
 did
 the
 fish
 say
 when
 it
 hit
 the
 wall
?


A
:
 Dam
!
*/

API Reference:

OpenAI from @langchain/openai

For models that do not support streaming, the entire response will be returned as a single chunk.

Using a callback handler

You can also use a CallbackHandler like so:

import { OpenAI } from "@langchain/openai";

// To enable streaming, we pass in `streaming: true` to the LLM constructor.
// Additionally, we pass in a handler for the `handleLLMNewToken` event.
const model = new OpenAI({
  maxTokens: 25,
  streaming: true,
});

const response = await model.invoke("Tell me a joke.", {
  callbacks: [
    {
      handleLLMNewToken(token: string) {
        console.log({ token });
      },
    },
  ],
});
console.log(response);
/*
{ token: '\n' }
{ token: '\n' }
{ token: 'Q' }
{ token: ':' }
{ token: ' Why' }
{ token: ' did' }
{ token: ' the' }
{ token: ' chicken' }
{ token: ' cross' }
{ token: ' the' }
{ token: ' playground' }
{ token: '?' }
{ token: '\n' }
{ token: 'A' }
{ token: ':' }
{ token: ' To' }
{ token: ' get' }
{ token: ' to' }
{ token: ' the' }
{ token: ' other' }
{ token: ' slide' }
{ token: '.' }


Q: Why did the chicken cross the playground?
A: To get to the other slide.
*/

API Reference:

OpenAI from @langchain/openai

We still have access to the end LLMResult if using generate. However, token_usage is not currently supported for streaming.

Streaming

Using .stream()​

API Reference:

Using a callback handler​

API Reference:

Help us out by providing feedback on this documentation page:

Using `.stream()`

Using a callback handler