Spider

Spider is the fastest crawler. It converts any website into pure HTML, markdown, metadata or text while enabling you to crawl with custom actions using AI.

Overview

Spider allows you to use high performance proxies to prevent detection, caches AI actions, webhooks for crawling status, scheduled crawls etc...

This guide shows how to crawl/scrape a website using Spider and loading the LLM-ready documents with SpiderLoader in LanghChain.

Setup

Get your own Spider API key on spider.cloud.

Usage

Here's an example of how to use the SpiderLoader:

Spider offers two scraping modes scrape and crawl. Scrape only gets the content of the url provided while crawl gets the content of the url provided and crawls deeper following subpages.

npm
Yarn
pnpm

npm install @spider-cloud/spider-client

yarn add @spider-cloud/spider-client

pnpm add @spider-cloud/spider-client

import { SpiderLoader } from "@langchain/community/document_loaders/web/spider";

const loader = new SpiderLoader({
  url: "https://spider.cloud", // The URL to scrape
  apiKey: process.env.SPIDER_API_KEY, // Optional, defaults to `SPIDER_API_KEY` in your env.
  mode: "scrape", // The mode to run the crawler in. Can be "scrape" for single urls or "crawl" for deeper scraping following subpages
  // params: {
  //   // optional parameters based on Spider API docs
  //   // For API documentation, visit https://spider.cloud/docs/api
  // },
});

const docs = await loader.load();

API Reference:

SpiderLoader from @langchain/community/document_loaders/web/spider

Additional Parameters

See the Spider documentation for all the available params.

Spider

Overview

Setup

Usage

API Reference:

Additional Parameters

Was this page helpful?

You can also leave detailed feedback on GitHub.

Spider

Overview​

Setup​

Usage​

API Reference:

Additional Parameters​

Was this page helpful?

You can also leave detailed feedback on GitHub.

Overview

Setup

Usage

Additional Parameters