How to cache model responses

LangChain provides an optional caching layer for LLMs. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.

tip

See this section for general instructions on installing integration packages.

npm
Yarn
pnpm

npm install @langchain/openai @langchain/core

yarn add @langchain/openai @langchain/core

pnpm add @langchain/openai @langchain/core

import { OpenAI } from "@langchain/openai";

const model = new OpenAI({
  model: "gpt-3.5-turbo-instruct",
  cache: true,
});

In Memory Cache

The default cache is stored in-memory. This means that if you restart your application, the cache will be cleared.

console.time();

// The first time, it is not yet in cache, so it should take longer
const res = await model.invoke("Tell me a long joke");

console.log(res);

console.timeEnd();

/*
  A man walks into a bar and sees a jar filled with money on the counter. Curious, he asks the bartender about it.

  The bartender explains, "We have a challenge for our customers. If you can complete three tasks, you win all the money in the jar."

  Intrigued, the man asks what the tasks are.

  The bartender replies, "First, you have to drink a whole bottle of tequila without making a face. Second, there's a pitbull out back with a sore tooth. You have to pull it out. And third, there's an old lady upstairs who has never had an orgasm. You have to give her one."

  The man thinks for a moment and then confidently says, "I'll do it."

  He grabs the bottle of tequila and downs it in one gulp, without flinching. He then heads to the back and after a few minutes of struggling, emerges with the pitbull's tooth in hand.

  The bar erupts in cheers and the bartender leads the man upstairs to the old lady's room. After a few minutes, the man walks out with a big smile on his face and the old lady is giggling with delight.

  The bartender hands the man the jar of money and asks, "How

  default: 4.187s
*/

console.time();

// The second time it is, so it goes faster
const res2 = await model.invoke("Tell me a joke");

console.log(res2);

console.timeEnd();

/*
  A man walks into a bar and sees a jar filled with money on the counter. Curious, he asks the bartender about it.

  The bartender explains, "We have a challenge for our customers. If you can complete three tasks, you win all the money in the jar."

  Intrigued, the man asks what the tasks are.

  The bartender replies, "First, you have to drink a whole bottle of tequila without making a face. Second, there's a pitbull out back with a sore tooth. You have to pull it out. And third, there's an old lady upstairs who has never had an orgasm. You have to give her one."

  The man thinks for a moment and then confidently says, "I'll do it."

  He grabs the bottle of tequila and downs it in one gulp, without flinching. He then heads to the back and after a few minutes of struggling, emerges with the pitbull's tooth in hand.

  The bar erupts in cheers and the bartender leads the man upstairs to the old lady's room. After a few minutes, the man walks out with a big smile on his face and the old lady is giggling with delight.

  The bartender hands the man the jar of money and asks, "How

  default: 175.74ms
*/

Caching with Momento

LangChain also provides a Momento-based cache. Momento is a distributed, serverless cache that requires zero setup or infrastructure maintenance. Given Momento's compatibility with Node.js, browser, and edge environments, ensure you install the relevant package.

To install for Node.js:

npm
Yarn
pnpm

npm install @gomomento/sdk

yarn add @gomomento/sdk

pnpm add @gomomento/sdk

To install for browser/edge workers:

npm
Yarn
pnpm

npm install @gomomento/sdk-web

yarn add @gomomento/sdk-web

pnpm add @gomomento/sdk-web

Next you'll need to sign up and create an API key. Once you've done that, pass a cache option when you instantiate the LLM like this:

import { OpenAI } from "@langchain/openai";
import {
  CacheClient,
  Configurations,
  CredentialProvider,
} from "@gomomento/sdk";
import { MomentoCache } from "@langchain/community/caches/momento";

// See https://github.com/momentohq/client-sdk-javascript for connection options
const client = new CacheClient({
  configuration: Configurations.Laptop.v1(),
  credentialProvider: CredentialProvider.fromEnvironmentVariable({
    environmentVariableName: "MOMENTO_API_KEY",
  }),
  defaultTtlSeconds: 60 * 60 * 24,
});
const cache = await MomentoCache.fromProps({
  client,
  cacheName: "langchain",
});

const model = new OpenAI({ cache });

API Reference:

OpenAI from @langchain/openai
MomentoCache from @langchain/community/caches/momento

Caching with Redis

LangChain also provides a Redis-based cache. This is useful if you want to share the cache across multiple processes or servers. To use it, you'll need to install the redis package:

npm
Yarn
pnpm

npm install ioredis

yarn add ioredis

pnpm add ioredis

Then, you can pass a cache option when you instantiate the LLM. For example:

import { OpenAI } from "@langchain/openai";
import { RedisCache } from "@langchain/community/caches/ioredis";
import { Redis } from "ioredis";

// See https://github.com/redis/ioredis for connection options
const client = new Redis({});

const cache = new RedisCache(client);

const model = new OpenAI({ cache });

Caching with Upstash Redis

LangChain provides an Upstash Redis-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Upstash Redis client uses HTTP and supports edge environments. To use it, you'll need to install the @upstash/redis package:

npm
Yarn
pnpm

npm install @upstash/redis

yarn add @upstash/redis

pnpm add @upstash/redis

You'll also need an Upstash account and a Redis database to connect to. Once you've done that, retrieve your REST URL and REST token.

Then, you can pass a cache option when you instantiate the LLM. For example:

import { OpenAI } from "@langchain/openai";
import { UpstashRedisCache } from "@langchain/community/caches/upstash_redis";

// See https://docs.upstash.com/redis/howto/connectwithupstashredis#quick-start for connection options
const cache = new UpstashRedisCache({
  config: {
    url: "UPSTASH_REDIS_REST_URL",
    token: "UPSTASH_REDIS_REST_TOKEN",
  },
  ttl: 3600,
});

const model = new OpenAI({ cache });

API Reference:

OpenAI from @langchain/openai
UpstashRedisCache from @langchain/community/caches/upstash_redis

You can also directly pass in a previously created @upstash/redis client instance:

import { Redis } from "@upstash/redis";
import https from "https";

import { OpenAI } from "@langchain/openai";
import { UpstashRedisCache } from "@langchain/community/caches/upstash_redis";

// const client = new Redis({
//   url: process.env.UPSTASH_REDIS_REST_URL!,
//   token: process.env.UPSTASH_REDIS_REST_TOKEN!,
//   agent: new https.Agent({ keepAlive: true }),
// });

// Or simply call Redis.fromEnv() to automatically load the UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN environment variables.
const client = Redis.fromEnv({
  agent: new https.Agent({ keepAlive: true }),
});

const cache = new UpstashRedisCache({ client });
const model = new OpenAI({ cache });

API Reference:

OpenAI from @langchain/openai
UpstashRedisCache from @langchain/community/caches/upstash_redis

Caching with Vercel KV

LangChain provides an Vercel KV-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Vercel KV client uses HTTP and supports edge environments. To use it, you'll need to install the @vercel/kv package:

npm
Yarn
pnpm

npm install @vercel/kv

yarn add @vercel/kv

pnpm add @vercel/kv

You'll also need an Vercel account and a KV database to connect to. Once you've done that, retrieve your REST URL and REST token.

Then, you can pass a cache option when you instantiate the LLM. For example:

import { OpenAI } from "@langchain/openai";
import { VercelKVCache } from "@langchain/community/caches/vercel_kv";
import { createClient } from "@vercel/kv";

// See https://vercel.com/docs/storage/vercel-kv/kv-reference#createclient-example for connection options
const cache = new VercelKVCache({
  client: createClient({
    url: "VERCEL_KV_API_URL",
    token: "VERCEL_KV_API_TOKEN",
  }),
  ttl: 3600,
});

const model = new OpenAI({ cache });

API Reference:

OpenAI from @langchain/openai
VercelKVCache from @langchain/community/caches/vercel_kv

Caching with Cloudflare KV

info

This integration is only supported in Cloudflare Workers.

If you're deploying your project as a Cloudflare Worker, you can use LangChain's Cloudflare KV-powered LLM cache.

For information on how to set up KV in Cloudflare, see the official documentation.

Note: If you are using TypeScript, you may need to install types if they aren't already present:

npm
Yarn
pnpm

npm install -S @cloudflare/workers-types

yarn add @cloudflare/workers-types

pnpm add @cloudflare/workers-types

import type { KVNamespace } from "@cloudflare/workers-types";

import { OpenAI } from "@langchain/openai";
import { CloudflareKVCache } from "@langchain/cloudflare";

export interface Env {
  KV_NAMESPACE: KVNamespace;
  OPENAI_API_KEY: string;
}

export default {
  async fetch(_request: Request, env: Env) {
    try {
      const cache = new CloudflareKVCache(env.KV_NAMESPACE);
      const model = new OpenAI({
        cache,
        model: "gpt-3.5-turbo-instruct",
        apiKey: env.OPENAI_API_KEY,
      });
      const response = await model.invoke("How are you today?");
      return new Response(JSON.stringify(response), {
        headers: { "content-type": "application/json" },
      });
    } catch (err: any) {
      console.log(err.message);
      return new Response(err.message, { status: 500 });
    }
  },
};

API Reference:

OpenAI from @langchain/openai
CloudflareKVCache from @langchain/cloudflare

Caching on the File System

danger

This cache is not recommended for production use. It is only intended for local development.

LangChain provides a simple file system cache. By default the cache is stored a temporary directory, but you can specify a custom directory if you want.

const cache = await LocalFileCache.create();

Next steps

You've now learned how to cache model responses to save time and money.

Next, check out the other how-to guides on LLMs, like how to create your own custom LLM class.

How to cache model responses

In Memory Cache

Caching with Momento

API Reference:

Caching with Redis

Caching with Upstash Redis

API Reference:

API Reference:

Caching with Vercel KV

API Reference:

Caching with Cloudflare KV

API Reference:

Caching on the File System

Next steps

Was this page helpful?

You can also leave detailed feedback on GitHub.

How to cache model responses

In Memory Cache​

Caching with Momento​

API Reference:

Caching with Redis​

Caching with Upstash Redis​

API Reference:

API Reference:

Caching with Vercel KV​

API Reference:

Caching with Cloudflare KV​

API Reference:

Caching on the File System​

Next steps​

Was this page helpful?

You can also leave detailed feedback on GitHub.

In Memory Cache

Caching with Momento

Caching with Redis

Caching with Upstash Redis

Caching with Vercel KV

Caching with Cloudflare KV

Caching on the File System

Next steps