Skip to main content

Zep Cloud

Zep is a long-term memory service for AI Assistant apps. With Zep, you can provide AI assistants with the ability to recall past conversations, no matter how distant, while also reducing hallucinations, latency, and cost.

Note: The ZepCloudVectorStore works with Documents and is intended to be used as a Retriever. It offers separate functionality to Zep's ZepCloudMemory class, which is designed for persisting, enriching and searching your user's chat history.

Why Zep's VectorStore? 🤖🚀

Zep automatically embeds documents added to the Zep Vector Store using low-latency models local to the Zep server. The Zep TS/JS client can be used in non-Node edge environments. These two together with Zep's chat memory functionality make Zep ideal for building conversational LLM apps where latency and performance are important.

Supported Search Types

Zep supports both similarity search and Maximal Marginal Relevance (MMR) search. MMR search is particularly useful for Retrieval Augmented Generation applications as it re-ranks results to ensure diversity in the returned documents.


Sign up for Zep Cloud and create a project.

Follow the Zep Cloud Typescript SDK Installation Guide to install and get started with Zep.


You'll need your Zep Cloud Project API Key to use the Zep VectorStore. See the Zep Cloud docs for more information.

Zep auto embeds all documents by default, and it's not expecting to receive any embeddings from the user. Since LangChain requires passing in a Embeddings instance, we pass in FakeEmbeddings.

Example: Creating a ZepVectorStore from Documents & Querying

npm install @getzep/zep-cloud @langchain/openai @langchain/community
import { ZepCloudVectorStore } from "@langchain/community/vectorstores/zep_cloud";
import { FakeEmbeddings } from "@langchain/core/utils/testing";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { randomUUID } from "crypto";

const loader = new TextLoader("src/document_loaders/example_data/example.txt");
const docs = await loader.load();
const collectionName = `collection${randomUUID().split("-")[0]}`;

const zepConfig = {
// Your Zep Cloud Project API key
apiKey: "<Zep Api Key>",

// We're using fake embeddings here, because Zep Cloud handles embedding for you
const embeddings = new FakeEmbeddings();

const vectorStore = await ZepCloudVectorStore.fromDocuments(

// Wait for the documents to be embedded
// eslint-disable-next-line no-constant-condition
while (true) {
const c = await vectorStore.client.document.getCollection(collectionName);
`Embedding status: ${c.documentEmbeddedCount}/${c.documentCount} documents embedded`
// eslint-disable-next-line no-promise-executor-return
await new Promise((resolve) => setTimeout(resolve, 1000));
if (c.documentEmbeddedCount === c.documentCount) {

const results = await vectorStore.similaritySearchWithScore("bar", 3);

console.log("Similarity Results:");

const results2 = await vectorStore.maxMarginalRelevanceSearch("bar", {
k: 3,

console.log("MMR Results:");

API Reference:

Example: Using ZepCloudVectorStore with Expression Language

import { ZepClient } from "@getzep/zep-cloud";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ConsoleCallbackHandler } from "@langchain/core/tracers/console";
import { ChatOpenAI } from "@langchain/openai";
import { Document } from "@langchain/core/documents";
import {
} from "@langchain/core/runnables";
import { ZepCloudVectorStore } from "@langchain/community/vectorstores/zep_cloud";
import { StringOutputParser } from "@langchain/core/output_parsers";

async function combineDocuments(docs: Document[], documentSeparator = "\n\n") {
const docStrings: string[] = await Promise.all( => doc.pageContent)
return docStrings.join(documentSeparator);

// Your Zep Collection Name
const collectionName = "<Zep Collection Name>";

const zepClient = new ZepClient({
// Your Zep Cloud Project API key
apiKey: "<Zep Api Key>",

const vectorStore = await ZepCloudVectorStore.init({
client: zepClient,

const prompt = ChatPromptTemplate.fromMessages([
`Answer the question based only on the following context: {context}`,
["human", "{question}"],

const model = new ChatOpenAI({
temperature: 0.8,
modelName: "gpt-3.5-turbo-1106",
const retriever = vectorStore.asRetriever();

const setupAndRetrieval = RunnableMap.from({
context: new RunnableLambda({
func: (input: string) => retriever.invoke(input).then(combineDocuments),
question: new RunnablePassthrough(),
const outputParser = new StringOutputParser();

const chain = setupAndRetrieval
callbacks: [new ConsoleCallbackHandler()],

const result = await chain.invoke("Project Gutenberg?");

console.log("result", result);

API Reference:

Was this page helpful?

You can also leave detailed feedback on GitHub.