Skip to main content

LLMs

Features (natively supported)

All LLMs implement the Runnable interface, which comes with default implementations of all methods, ie. invoke, batch, stream, map. This gives all LLMs basic support for invoking, streaming, batching and mapping requests, which by default is implemented as below:

  • Streaming support defaults to returning an AsyncIterator of a single value, the final result returned by the underlying LLM provider. This obviously doesn't give you token-by-token streaming, which requires native support from the LLM provider, but ensures your code that expects an iterator of tokens can work for any of our LLM integrations.
  • Batch support defaults to calling the underlying LLM in parallel for each input. The concurrency can be controlled with the maxConcurrency key in RunnableConfig.
  • Map support defaults to calling .invoke across all instances of the array which it was called on.

Each LLM integration can optionally provide native implementations for invoke, streaming or batch, which, for providers that support it, can be more efficient. The table shows, for each integration, which features have been implemented with native support.

ModelInvokeStreamBatch
AI21
AlephAlpha
AzureOpenAI
CloudflareWorkersAI
Cohere
Fireworks
GooglePaLM
HuggingFaceInference
LlamaCpp
Ollama
OpenAI
OpenAIChat
PromptLayerOpenAI
PromptLayerOpenAIChat
Portkey
Replicate
SageMakerEndpoint
Writer
YandexGPT

Help us out by providing feedback on this documentation page: