Skip to main content

SQL

This example demonstrates the use of Runnables with questions and more on a SQL database.

This example uses Chinook database, which is a sample database available for SQL Server, Oracle, MySQL, etc.

info

Looking for the older, non-LCEL version? Click here.

Set up

First install typeorm:

npm install typeorm

Then, install the dependencies needed for your database. For example, for SQLite:

npm install sqlite3

LangChain offers default prompts for: default SQL, Postgres, SQLite, Microsoft SQL Server, MySQL, and SAP HANA.

Finally follow the instructions on https://database.guide/2-sample-databases-sqlite/ to get the sample database for this example.

npm install @langchain/openai
import { DataSource } from "typeorm";
import { SqlDatabase } from "langchain/sql_db";
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";

/**
* This example uses Chinook database, which is a sample database available for SQL Server, Oracle, MySQL, etc.
* To set it up follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the .db file
* in the examples folder.
*/
const datasource = new DataSource({
type: "sqlite",
database: "Chinook.db",
});

const db = await SqlDatabase.fromDataSourceParams({
appDataSource: datasource,
});

const llm = new ChatOpenAI();

/**
* Create the first prompt template used for getting the SQL query.
*/
const prompt =
PromptTemplate.fromTemplate(`Based on the provided SQL table schema below, write a SQL query that would answer the user's question.
------------
SCHEMA: {schema}
------------
QUESTION: {question}
------------
SQL QUERY:`);
/**
* You can also load a default prompt by importing from "langchain/sql_db"
*
* import {
* DEFAULT_SQL_DATABASE_PROMPT
* SQL_POSTGRES_PROMPT
* SQL_SQLITE_PROMPT
* SQL_MSSQL_PROMPT
* SQL_MYSQL_PROMPT
* SQL_SAP_HANA_PROMPT
* } from "langchain/sql_db";
*
*/

/**
* Create a new RunnableSequence where we pipe the output from `db.getTableInfo()`
* and the users question, into the prompt template, and then into the llm.
* We're also applying a stop condition to the llm, so that it stops when it
* sees the `\nSQLResult:` token.
*/
const sqlQueryChain = RunnableSequence.from([
{
schema: async () => db.getTableInfo(),
question: (input: { question: string }) => input.question,
},
prompt,
llm.bind({ stop: ["\nSQLResult:"] }),
new StringOutputParser(),
]);

const res = await sqlQueryChain.invoke({
question: "How many employees are there?",
});
console.log({ res });

/**
* { res: 'SELECT COUNT(*) FROM tracks;' }
*/

/**
* Create the final prompt template which is tasked with getting the natural language response.
*/
const finalResponsePrompt =
PromptTemplate.fromTemplate(`Based on the table schema below, question, SQL query, and SQL response, write a natural language response:
------------
SCHEMA: {schema}
------------
QUESTION: {question}
------------
SQL QUERY: {query}
------------
SQL RESPONSE: {response}
------------
NATURAL LANGUAGE RESPONSE:`);

/**
* Create a new RunnableSequence where we pipe the output from the previous chain, the users question,
* and the SQL query, into the prompt template, and then into the llm.
* Using the result from the `sqlQueryChain` we can run the SQL query via `db.run(input.query)`.
*/
const finalChain = RunnableSequence.from([
{
question: (input) => input.question,
query: sqlQueryChain,
},
{
schema: async () => db.getTableInfo(),
question: (input) => input.question,
query: (input) => input.query,
response: (input) => db.run(input.query),
},
finalResponsePrompt,
llm,
new StringOutputParser(),
]);

const finalResponse = await finalChain.invoke({
question: "How many employees are there?",
});

console.log({ finalResponse });

/**
* { finalResponse: 'There are 8 employees.' }
*/

API Reference:

You can include or exclude tables when creating the SqlDatabase object to help the chain focus on the tables you want. It can also reduce the number of tokens used in the chain.

const db = await SqlDatabase.fromDataSourceParams({
appDataSource: datasource,
includesTables: ["Track"],
});

If desired, you can return the used SQL command when calling the chain.

import { DataSource } from "typeorm";
import { SqlDatabase } from "langchain/sql_db";
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";

/**
* This example uses Chinook database, which is a sample database available for SQL Server, Oracle, MySQL, etc.
* To set it up follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the .db file
* in the examples folder.
*/
const datasource = new DataSource({
type: "sqlite",
database: "Chinook.db",
});

const db = await SqlDatabase.fromDataSourceParams({
appDataSource: datasource,
});

const llm = new ChatOpenAI();

/**
* Create the first prompt template used for getting the SQL query.
*/
const prompt =
PromptTemplate.fromTemplate(`Based on the provided SQL table schema below, write a SQL query that would answer the user's question.
------------
SCHEMA: {schema}
------------
QUESTION: {question}
------------
SQL QUERY:`);

/**
* Create a new RunnableSequence where we pipe the output from `db.getTableInfo()`
* and the users question, into the prompt template, and then into the llm.
* We're also applying a stop condition to the llm, so that it stops when it
* sees the `\nSQLResult:` token.
*/
const sqlQueryChain = RunnableSequence.from([
{
schema: async () => db.getTableInfo(),
question: (input: { question: string }) => input.question,
},
prompt,
llm.bind({ stop: ["\nSQLResult:"] }),
new StringOutputParser(),
]);

/**
* Create the final prompt template which is tasked with getting the natural
* language response to the SQL query.
*/
const finalResponsePrompt =
PromptTemplate.fromTemplate(`Based on the table schema below, question, SQL query, and SQL response, write a natural language response:
------------
SCHEMA: {schema}
------------
QUESTION: {question}
------------
SQL QUERY: {query}
------------
SQL RESPONSE: {response}
------------
NATURAL LANGUAGE RESPONSE:`);

/**
* Create a new RunnableSequence where we pipe the output from the previous chain, the users question,
* and the SQL query, into the prompt template, and then into the llm.
* Using the result from the `sqlQueryChain` we can run the SQL query via `db.run(input.query)`.
*
* Lastly we're piping the result of the first chain (the outputted SQL query) so it is
* logged along with the natural language response.
*/
const finalChain = RunnableSequence.from([
{
question: (input) => input.question,
query: sqlQueryChain,
},
{
schema: async () => db.getTableInfo(),
question: (input) => input.question,
query: (input) => input.query,
response: (input) => db.run(input.query),
},
{
result: finalResponsePrompt.pipe(llm).pipe(new StringOutputParser()),
// Pipe the query through here unchanged so it gets logged alongside the result.
sql: (previousStepResult) => previousStepResult.query,
},
]);

const finalResponse = await finalChain.invoke({
question: "How many employees are there?",
});

console.log({ finalResponse });

/**
* {
* finalResponse: {
* result: 'There are 8 employees.',
* sql: 'SELECT COUNT(*) FROM tracks;'
* }
* }
*/

API Reference:

Disclaimer ⚠️

The query chain may generate insert/update/delete queries. When this is not expected, use a custom prompt or create SQL users without write permissions.

The final user might overload your SQL database by asking a simple question such as "run the biggest query possible". The generated query might look like:

SELECT * FROM "public"."users"
JOIN "public"."user_permissions" ON "public"."users".id = "public"."user_permissions".user_id
JOIN "public"."projects" ON "public"."users".id = "public"."projects".user_id
JOIN "public"."events" ON "public"."projects".id = "public"."events".project_id;

For a transactional SQL database, if one of the table above contains millions of rows, the query might cause trouble to other applications using the same database.

Most datawarehouse oriented databases support user-level quota, for limiting resource usage.


Help us out by providing feedback on this documentation page: