AI modules

Built-in AI capabilities exposed as tools — image generation, video generation, music, TTS, transcription, and the local agent-search engine. No external OAuth required; gated only by the corresponding API keys you set in .env.

Image generation

Tool	Backend	Notes
`image_gen`	Google Veo (default)	Photorealistic + illustration styles.

Set GOOGLE_VEO_API_KEY (or your provider). Generated images are stored short-term in Bee Flow's blob store and served via signed URLs that expire in 24h. Add to a Knowledge Base or upload to Files for permanent storage.

Video generation

Tool	Backend	Notes
`video_gen`	Provider-agnostic	Configurable per org.

Sora-class generation is Pro+. Lower-tier plans get static "video stub" previews.

Music

Tool	Backend	Notes
`music_gen`	ElevenLabs Music	Royalty-free instrumental tracks up to 4 min.

Requires ELEVENLABS_API_KEY.

Text-to-speech

Tool	Backend	Notes
`tts_speak`	ElevenLabs / Voxtral	Streamed via SSE for low first-byte latency.

Used by Voice. Different from voice_call — tts_speak is one-shot for a specific message.

Transcription

Tool	Backend	Notes
`transcribe_audio`	Voxtral (Mistral) / Azure Speech / Azure Whisper / WhisperX self-hosted / local-whisper (whisper.cpp)	Pick under Admin → Integrations → Voice

The default is Voxtral (Mistral cloud) — fast, multilingual, cheapest per minute. Switch in Admin → Integrations → Voice → Active Provider.

For audio capture in the chat UI (voice messages, "press to talk"), the in-process whisper.cpp path runs entirely on CPU — no GPU, no API key needed. Toggle via local_whisper_enabled config.

Sound effects

Tool	Backend	Notes
`sound_effect_gen`	ElevenLabs Sound Effects	10s clips.

Agent Search

Tool	Backend	Notes
`agent_search`	Self-hosted Agent Search service / Cloud-only / Azure Bing	Web search + page fetch + rerank + cleanup. Picked in Admin → Integrations → Zoeken.
`kb_search`	In-process pgvector / remote search-service	KB-only retrieval. Routed by `kb_provider` admin toggle.

The cloud-only path runs the entire SERP → fetch → embed → rerank → cleanup loop inside the Node server using your configured providers (Mistral, OpenAI, Azure, Cohere) plus the in-process CPU rerank/embed models — no GPU box required. See Web search for setup and provider comparison.

Privacy

All AI module outputs go through the Privacy Shield on the way out. Inputs (the prompt you describe) go through too, so generating an image from a prompt with redacted text uses the placeholders ("a sketch of [person_1]'s office").

Cost

Module	Approx cost per call
`image_gen` (Veo)	$0.02–0.10
`video_gen` (Sora-class)	$0.50–2.00 per 6-second clip
`music_gen`	$0.30 per minute generated
`tts_speak`	$0.05 per minute synthesised
`transcribe_audio`	$0.005–0.01 per minute
`sound_effect_gen`	$0.01 per clip

Pricing varies by provider and tier — check the provider's current rates.

Tier gating

Module	Min tier
`agent_search`	Community
`image_gen`	Community
`tts_speak`	Pro (voice feature)
`transcribe_audio`	Pro
`music_gen`	Pro
`video_gen`	Pro
`sound_effect_gen`	Pro

(Community has limited monthly quota for image gen; Pro+ uses your provider quota.)

Image generation​

Video generation​

Music​

Text-to-speech​

Transcription​

Sound effects​

Agent Search​

Privacy​

Cost​

Tier gating​