Documentation Index
Fetch the complete documentation index at: https://daily-docs-cli-pr-89.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
InceptionLLMService provides access to Inception’s Mercury-2 diffusion-based reasoning model through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management with advanced reasoning capabilities.
Inception LLM API Reference
Pipecat’s API methods for Inception integration
Example Implementation
Complete example with function calling
Inception Labs
Access models and manage API keys
Installation
To use Inception services, install the required dependency:Prerequisites
Inception Account Setup
Before using Inception LLM services, you need:- Inception Account: Sign up at Inception Labs
- API Key: Generate an API key from your account dashboard
- Model Selection: Access to Mercury-2, Inception’s diffusion-based reasoning model
Required Environment Variables
INCEPTION_API_KEY: Your Inception API key for authentication
Configuration
Inception API key for authentication.
Base URL for Inception API endpoint.
Runtime-configurable settings. See Settings below.
Settings
Runtime-configurable settings passed via thesettings constructor argument using InceptionLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
This service extends OpenAILLMService.Settings with Inception-specific parameters:
Model identifier to use. Defaults to “mercury-2”, Inception’s diffusion-based reasoning model.
Controls how much reasoning the model applies. Options are “instant”, “low”, “medium”, or “high”. When unset, the parameter is omitted and Inception’s server-side default applies.
When True, reduces time to first diffusion block (TTFT) for faster initial response times.
Usage
Basic Setup
With Custom Settings
With Function Calling
Notes
- Inception does not support the
"developer"message role. Use"system"instead. - The Mercury-2 model uses a diffusion-based reasoning approach, which can be controlled via the
reasoning_effortparameter. - Setting
realtime=Trueoptimizes for lower time-to-first-token at the potential cost of reasoning depth.