Skip to main content

Contextual compression

One challenge with retrieval is that usually you don't know the specific queries your document storage system will face when you ingest data into the system. This means that the information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.

To use the Contextual Compression Retriever, you'll need:

  • a base retriever
  • a Document Compressor

The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether.

Using a vanilla vector store retriever​

Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). Given an example question, our retriever returns one or two relevant docs and a few irrelevant docs, and even the relevant docs have a lot of irrelevant information in them. To extract all the context we can, we use an LLMChainExtractor, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

npm install @langchain/openai @langchain/community
import * as fs from "fs";

import { OpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { ContextualCompressionRetriever } from "langchain/retrievers/contextual_compression";
import { LLMChainExtractor } from "langchain/retrievers/document_compressors/chain_extract";

const model = new OpenAI({
model: "gpt-3.5-turbo-instruct",
});
const baseCompressor = LLMChainExtractor.fromLLM(model);

const text = fs.readFileSync("state_of_the_union.txt", "utf8");

const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);

// Create a vector store from the documents.
const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());

const retriever = new ContextualCompressionRetriever({
baseCompressor,
baseRetriever: vectorStore.asRetriever(),
});

const retrievedDocs = await retriever.invoke(
"What did the speaker say about Justice Breyer?"
);

console.log({ retrievedDocs });

/*
{
retrievedDocs: [
Document {
pageContent: 'One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.',
metadata: [Object]
},
Document {
pageContent: '"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service."',
metadata: [Object]
},
Document {
pageContent: 'The onslaught of state laws targeting transgender Americans and their families is wrong.',
metadata: [Object]
}
]
}
*/

API Reference:

EmbeddingsFilter​

Making an extra LLM call over each retrieved document is expensive and slow. The EmbeddingsFilter provides a cheaper and faster option by embedding the documents and query and only returning those documents which have sufficiently similar embeddings to the query.

This is most useful for non-vector store retrievers where we may not have control over the returned chunk size, or as part of a pipeline, outlined below.

Here's an example:

import * as fs from "fs";

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { OpenAIEmbeddings } from "@langchain/openai";
import { ContextualCompressionRetriever } from "langchain/retrievers/contextual_compression";
import { EmbeddingsFilter } from "langchain/retrievers/document_compressors/embeddings_filter";

const baseCompressor = new EmbeddingsFilter({
embeddings: new OpenAIEmbeddings(),
similarityThreshold: 0.8,
});

const text = fs.readFileSync("state_of_the_union.txt", "utf8");

const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);

// Create a vector store from the documents.
const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());

const retriever = new ContextualCompressionRetriever({
baseCompressor,
baseRetriever: vectorStore.asRetriever(),
});

const retrievedDocs = await retriever.invoke(
"What did the speaker say about Justice Breyer?"
);
console.log({ retrievedDocs });

/*
{
retrievedDocs: [
Document {
pageContent: 'And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. \n' +
'\n' +
'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n' +
'\n' +
'And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n' +
'\n' +
'We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n' +
'\n' +
'We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n' +
'\n' +
'We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.',
metadata: [Object]
},
Document {
pageContent: 'In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n' +
'\n' +
'We cannot let this happen. \n' +
'\n' +
'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n' +
'\n' +
'Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n' +
'\n' +
'One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n' +
'\n' +
'And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.',
metadata: [Object]
}
]
}
*/

API Reference:

Stringing compressors and document transformers together​

Using the DocumentCompressorPipeline we can also easily combine multiple compressors in sequence. Along with compressors we can add BaseDocumentTransformers to our pipeline, which don't perform any contextual compression but simply perform some transformation on a set of documents. For example TextSplitters can be used as document transformers to split documents into smaller pieces, and the EmbeddingsFilter can be used to filter out documents based on similarity of the individual chunks to the input query.

Below we create a compressor pipeline by first splitting raw webpage documents retrieved from the Tavily web search API retriever into smaller chunks, then filtering based on relevance to the query. The result is smaller chunks that are semantically similar to the input query. This skips the need to add documents to a vector store to perform similarity search, which can be useful for one-off use cases:

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "@langchain/openai";
import { ContextualCompressionRetriever } from "langchain/retrievers/contextual_compression";
import { EmbeddingsFilter } from "langchain/retrievers/document_compressors/embeddings_filter";
import { TavilySearchAPIRetriever } from "@langchain/community/retrievers/tavily_search_api";
import { DocumentCompressorPipeline } from "langchain/retrievers/document_compressors";

const embeddingsFilter = new EmbeddingsFilter({
embeddings: new OpenAIEmbeddings(),
similarityThreshold: 0.8,
k: 5,
});

const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 200,
chunkOverlap: 0,
});

const compressorPipeline = new DocumentCompressorPipeline({
transformers: [textSplitter, embeddingsFilter],
});

const baseRetriever = new TavilySearchAPIRetriever({
includeRawContent: true,
});

const retriever = new ContextualCompressionRetriever({
baseCompressor: compressorPipeline,
baseRetriever,
});

const retrievedDocs = await retriever.invoke(
"What did the speaker say about Justice Breyer in the 2022 State of the Union?"
);
console.log({ retrievedDocs });

/*
{
retrievedDocs: [
Document {
pageContent: 'Justice Stephen Breyer talks to President Joe Biden ahead of the State of the Union address on Tuesday. (jabin botsford/Agence France-Presse/Getty Images)',
metadata: [Object]
},
Document {
pageContent: 'President Biden recognized outgoing US Supreme Court Justice Stephen Breyer during his State of the Union on Tuesday.',
metadata: [Object]
},
Document {
pageContent: 'What we covered here\n' +
'Biden recognized outgoing Supreme Court Justice Breyer during his speech',
metadata: [Object]
},
Document {
pageContent: 'States Supreme Court. Justice Breyer, thank you for your service,” the president said.',
metadata: [Object]
},
Document {
pageContent: 'Court," Biden said. "Justice Breyer, thank you for your service."',
metadata: [Object]
}
]
}
*/

API Reference:


Help us out by providing feedback on this documentation page: