Каталог нейросетей для генерации текста, изображений и видео

Mock Model

Mock

GPT-4.1

OpenAI

GPT-4.1-Mini

OpenAI

GPT-4.1-Nano

OpenAI

DeepSeek Chat

DeepSeek

DeepSeek Coder

DeepSeek

GPT-5

OpenAI

GPT-5-Mini

OpenAI

GPT-5-Nano

OpenAI

Claude Opus 4.5

Claude

Claude Sonnet 4.5

Claude

Claude Haiku 4.5

Claude

Grok 4 Fast

Grok

Grok 4.1 Fast (Reasoning)

Grok

Grok 4 Fast (Reasoning)

Grok

Grok 4

Grok

Grok 3 Mini

Grok

Grok 3

Grok

Gemini 3 Pro Preview

Gemini

Gemini 2.5 Pro

Gemini

Gemini 2.5 Flash

Gemini

Gemini 2.5 Flash-Lite

Gemini

Ai Detector

AI Detector (Text) is an advanced AI service that analyzes a passage and returns a verdict on whether it was likely written by AI.

Claude Haiku 4.5

Claude Opus 4.5

Claude Sonnet 4.5

DeepSeek Chat

DeepSeek Coder

ElevenLabs Speech to Text

Generate text from speech using ElevenLabs advanced speech-to-text model.

ElevenLabs Speech to Text - Scribe V2

Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!

Fibo

Structured Prompt Generation endpoint for Fibo, Bria's SOTA Open source model

Fibo Edit [Structured Instruction]

Structured Instructions Generation endpoint for Fibo Edit, Bria's newest editing model.

Fibo Lite

Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model

Fibo Lite

Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model

Gemini 2.5 Flash

Gemini 2.5 Flash Lite

Gemini 2.5 Pro

Gemini 3 Pro Preview

GPT-4.1

GPT-4.1 mini

GPT-4.1 nano

GPT-5

GPT-5 mini

GPT-5 nano

Grok 3

Grok 3 mini

Grok 4

Grok 4.1 Fast

Grok 4.1 Fast Reasoning

Grok 4 Fast Reasoning

Mock Chat

Local mock model for chat testing

Nemotron

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

Nemotron

Use the fast speed and pin point accuracy of nemotron to transcribe your texts.

OpenRouter [Video]

Run any VLM (Video Language Model) with fal, powered by OpenRouter.

OpenRouter [Video][Enterprise]

Run any VLM (Video Language Model) with fal, powered by OpenRouter.

Pipecat's Smart Turn model

An open source, community-driven and native audio turn detection model by Pipecat AI.

Silero VAD

Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model

Speech-to-Text

Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.

Speech-to-Text

Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.

Speech-to-Text

Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.

Speech-To-text

Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.

Whisper

Whisper is a model for speech transcription and translation.

Wizper (Whisper v3 -- fal.ai edition)

[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!

Bytedance

Image to 3D endpoint for Bytedance's high-quality Seed3D 3d model generator.

Hunyuan 3d

Create detailed, fully-textured 3D models with text

Hunyuan3D

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Hunyuan3D

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Hunyuan3D

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Hunyuan3D

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Hunyuan3D

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Hunyuan3D

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Hunyuan 3D 2.1

Hunyuan3D-2.1 is a scalable 3D asset creation system that advances state-of-the-art 3D generation through Physically-Based Rendering (PBR).

Hunyuan 3D Part Splitter

Split 3D models into parts with Hunyuan 3D

Hunyuan 3D Pro Image to 3D

Generate 3D models from images with Hunyuan 3D Pro

Hunyuan 3D Pro Text to 3D

Generate 3D models from text prompts with Hunyuan 3D Pro

Hunyuan 3D Rapid Image to 3D

Rapidly generate 3D models from images using Hunyuan 3D.

Hunyuan 3D Smart Topology

Optimize 3D mesh topology with Hunyuan 3D Smart Topology.

Hunyuan3d V3

Create your imagined 3D models with just text. Production-ready, export-ready professional assets with realistic lighting and materials in minutes.

Hunyuan3d V3

Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing.

Hunyuan3d V3

Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender.

Hunyuan Motion [0.46B]

Generate 3D human motions via text-to-generation interface of Hunyuan Motion!

Hunyuan Motion [1B]

Generate 3D human motions via text-to-generation interface of Hunyuan Motion!

Hunyuan Part

Use the capabilities of hunyuan part to generate point clouds from your 3D files.

Hunyuan World

Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.

Hyper3d

Rodin by Hyper3D generates realistic and production ready 3D models from text or images.

Hyper3D Rodin

Rodin by Hyper3D generates realistic and production ready 3D models from text or images.

Meshy 5 Multi

Meshy-5 multi image generates realistic and production ready 3D models from multiple images.

Meshy 5 Remesh

Meshy-5 remesh allows you to remesh and export existing 3D models into various formats

Meshy 5 Retexture

Meshy-5 retexture applies new, high-quality textures to existing 3D models using either text prompts or reference images. It supports PBR material generation for realistic, production-ready results.

Meshy 6

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

Meshy 6

Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.

Meshy 6 Preview

Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.

Meshy 6 Preview

Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.

Omnipart

Image-to-3D endpoint for OmniPart, a part-aware 3D generator with semantic decoupling and structural cohesion.

Pshuman

Use the 6D pose estimation capabilities of PSHuman to generate 3D files from single image.

Sam 3

SAM 3D enables full scene reconstructions, placing objects and humans in a shared context together.

Sam 3

SAM 3D enables precise 3D reconstruction of objects from real images, while accurately reconstructing their geometry and texture.

Sam 3

SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image.

Trellis

Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Trellis

Generate 3D models from your images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Trellis 2

Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Trellis 2

Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation.

Tripo3D

State of the art Multiview to 3D Object generation. Generate 3D models from multiple images!

Tripo3D

State of the art Image to 3D Object generation. Generate 3D model from a single image!

TripoSR

State of the art Image to 3D Object generation

Ultrashape

UltraShape-1.0 is a 3D diffusion framework that generates high-fidelity 3D geometry through coarse-to-fine geometric refinement.

ACE-Step

Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step

ACE-Step

Modify a portion of provided audio with lyrics and/or style using ACE-Step

ACE-Step

Generate music with lyrics from text using ACE-Step

ACE-Step

Generate music from a simple prompt using ACE-Step

ACE-Step

Generate music from a lyrics and example audio using ACE-Step

Audio Understanding

A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.

Chatterbox

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Chatterbox

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Chatterbox

Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.

Chatterboxhd

Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.

Chatterboxhd

Transform voices using Resemble AI's Chatterbox. Convert audio to new voices or your own samples, with expressive results and built-in perceptual watermarking.

CSM-1B

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.

DeepFilterNet 3

Enhance speech audio by removing background noise and upsampling to 48KHz

Demucs

SOTA stemming model for voice, drums, bass, guitar and more.

Dia

Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.

Dia Tts

Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.

DiffRhythm: Lyrics to Song

DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.

Elevenlabs

Generate sound effects using ElevenLabs advanced sound effects model.

Elevenlabs

Generate realistic audio dialogues using Eleven-v3 from ElevenLabs.

Elevenlabs

Generate text-to-speech audio using Eleven-v3 from ElevenLabs.

ElevenLabs Audio Isolation

Isolate audio tracks using ElevenLabs advanced audio isolation technology.

Elevenlabs Music

Generate high quality, realistic music with fine controls using Elevenlabs Music!

ElevenLabs TTS Multilingual v2

Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.

ElevenLabs TTS Turbo v2.5

Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.

ElevenLabs Voice Changer

Change the voices in your audios with voices in ElevenLabs!

F5 TTS

F5 TTS

FFmpeg API [Merge Audios]

Merge audios into a single audio using FFmpeg API!

Index TTS 2.0

Generate natural, clear speeches using Index TTS 2.0 from IndexTeam

Kling TTS

Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.

Kling Video

Generate audio from input videos using Kling

Kling Video Create Voice

Create Voices to be used with Kling Models Voice Control

Kokoro TTS

Kokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient.

Kokoro TTS (Brazilian Portuguese)

A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.

Kokoro TTS (British English)

A high-quality British English text-to-speech model offering natural and expressive voice synthesis.

Kokoro TTS (French)

An expressive and natural French text-to-speech model for both European and Canadian French.

Kokoro TTS (Hindi)

A fast and expressive Hindi text-to-speech model with clear pronunciation and accurate intonation.

Kokoro TTS (Italian)

A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.

Kokoro TTS (Japanese)

A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.

Kokoro TTS (Mandarin Chinese)

A highly efficient Mandarin Chinese text-to-speech model that captures natural tones and prosody.

Kokoro TTS (Spanish)

A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.

Lava SR

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz, with denoising for particularly bad inputs.

Lyria2

Lyria 2 is Google's latest music generation model, you can generate any type of music with this model.

Maya

Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.

Maya

Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.

Maya1

Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.

Minimax

Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

Minimax

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax (Hailuo AI) Music

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

MiniMax (Hailuo AI) Music v1.5

Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

Minimax Music

Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.

MiniMax Speech-02 HD

Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Speech-02 Turbo

Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Speech 2.6 [HD]

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Speech 2.6 [Turbo]

Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Speech 2.8 [HD]

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Speech 2.8 [Turbo]

Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Voice Cloning

Clone a voice from a sample audio and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.

MiniMax Voice Design

Design a personalized voice from a text description, and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.

Mirelo SFX

Generate synced sounds for any video, and return the new sound track (like MMAudio)

Mirelo SFX V1.5

Generate synced sounds for any video, and return the new sound track (like MMAudio)

MMAudio V2 Text to Audio

MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.

Music Generation

Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.

music generator

CassetteAI’s model generates a 30-second sample in under 2 seconds and a full 3-minute track in under 10 seconds. At 44.1 kHz stereo audio, expect a level of professional consistency with no breaks, no squeaks, and no random interruptions in your creations.

Nova SR

Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz

Orpheus TTS

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time performances.

Personaplex

PersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning.

Qwen 3 TTS - Clone Voice [0.6B]

Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!

Qwen 3 TTS - Clone Voice [1.7B]

Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!

Qwen 3 TTS - Text to Speech [0.6B]

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

Qwen 3 TTS - Text to Speech [1.7B]

Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model

Qwen 3 TTS - Voice Design [1.7B]

Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!

Sam Audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

Sam Audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

Sam Audio

Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.

Sonauto V2

Extend an existing song

Sonauto V2

Replace sections of an existing audio with newly generated content

Sonauto V2

Create full songs in any style

Sound Effect Generation

Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.

Sound Effects Generator

Create stunningly realistic sound effects in seconds - CassetteAI's Sound Effects Model generates high-quality SFX up to 30 seconds long in just 1 second of processing time

Stable Audio 2.5

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

Stable Audio 2.5

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

Stable Audio 25

Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI

Stable Audio Open

Open source text-to-audio model.

Vibevoice

Generate long speech snippets fast using Microsoft's powerful TTS.

VibeVoice 1.5B

Generate long, expressive multi-voice speech using Microsoft's powerful TTS

VibeVoice 7B

Generate long, expressive multi-voice speech using Microsoft's powerful TTS

Workflow Utilities

FFMPEG Utility for Impulse Response

Workflow Utilities

FFMPEG Utility for Audio Compression

YuE: Lyrics to Song

YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs.

Zonos-Audio-Clone

Clone voice of any person and speak anything in their voice using zonos' voice cloning.

Age Modify

Modify a face to look younger or older while keeping identity realistic.

Ai Baby And Aging Generator

AI Baby Generator is a service that instantly creates realistic predictions of a future child from parent photos.

Ai Baby And Aging Generator

AI Aging Generator performs controllable age progression or regression from a single face photo, generating lifelike portraits across eight age groups from baby to senior.

Ai Face Swap

AI-FaceSwap-Image is a service that can take one person's face and realistically blend it onto another's in a photo.

Ai Home

AI Home Edit transforms your home interior and exterior photos with realistic, prompt-based edits

Ai Home

AI Home Style reimagines your home interior and exterior design with bold, prompt-driven concepts

AuraFlow

AuraFlow v0.3 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta.

AuraSR

Upscale your images with AuraSR.

Bagel

Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both images and text.

Bagel

Bagel is a 7B parameter from Bytedance-Seed multimodal model that can generate both text and images.

ben-v2-image

A fast and high quality model for image background removal.

Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

Birefnet Background Removal

bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

Bitdance

Image generation with BitDance. Fast, high-resolution photorealistic images using an autoregressive LLM— for efficient, high-quality results.

Bria

Structure Reference allows generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data for safe and risk-free commercial use.

Bria 3.2 Text-to-Image

Bria’s Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Excels in Text-Rendering and Aesthetics.

Bria Background Replace

Bria Background Replace allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use

Bria Eraser

Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us

Bria Expand Image

Bria Expand expands images beyond their borders in high quality. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us

Bria GenFill

Bria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us

Bria Product Shot

Place any product in any scenery with just a prompt or reference image while maintaining high integrity of the product. Trained exclusively on licensed data for safe and risk-free commercial use and optimized for eCommerce.

Bria RMBG 2.0

Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04

Bria Text-to-Image Base

Bria's Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us

Bria Text-to-Image Fast

Bria's Text-to-Image model with perfect harmony of latency and quality. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us

Bria Text-to-Image HD

Bria's Text-to-Image model for HD images. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us

Bytedance

Image editing endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent image editing with multiple inputs.

Bytedance

Seedream 3.0 is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation.

Bytedance

A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.

Bytedance

A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.

Bytedance

Dreamina showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details.

Bytedance

Text to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation.

Bytedance Seedream v4

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

Bytedance Seedream v4 Edit

A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.

Calligrapher

Use the text and font retaining capabilities of calligrapher to modify texts on your books, clothes and many more.

Cartoonify

Transform images into 3D cartoon artwork using an AI model that applies cartoon stylization while preserving the original image's composition and details.

CCSR Upscaler

SOTA Image Upscaler

Chain Of Zoom

Extreme Super-Resolution via Scale Autoregression and Preference Alignment

Chrono Edit

NVIDIA's Logically Consistent and Physics-Aware Image Editing Model

Chrono Edit Lora

LoRA endpoint for the Chrono Edit model.

Chrono Edit Lora Gallery

Upscales and cleans up the image.

Chrono Edit Lora Gallery

You can make edits simply by drawing a quick sketch on the input image.

City Teleport

Place a person’s photo into iconic cities worldwide.

Clarity Upscaler

Clarity upscaler for upscaling images with high very fidelity.

CodeFormer

Fix distorted or blurred photos of people with CodeFormer.

CogView

Generate high quality images from text prompts using CogView4. Longer text prompts will result in better quality images.

ControlNet SDXL

Generate Images with ControlNet.

ControlNet SDXL

Generate Images with ControlNet.

ControlNet SDXL

Generate Images with ControlNet.

Creative Upscaler

Create creative upscaled images.

Crystal Upscaler

An advanced image enhancement tool designed specifically for facial details and portrait photography, utilizing Clarity AI's upscaling technology.

DDColor

Bring colors into old or new black and white photos with DDColor.

DeepSeek Janus-Pro

DeepSeek Janus-Pro is a novel text-to-image model that unifies multimodal understanding and generation through an autoregressive framework

DiffusionEdge

Diffusion based high quality edge detection

DocRes

Enhance low-resolution, blur, shadowed documents with the superior quality of docres for sharper, clearer results.

DocRes-dewarp

Enhance wraped, folded documents with the superior quality of docres for sharper, clearer results.

DRCT-Super-Resolution

Upscale your images with DRCT-Super-Resolution.

DreamO

DreamO is an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions.

DreamOmni2

DreamOmni2 is a unified multimodal model for text and image guided image editing.

Dreamshaper

Dreamshaper model.

DWPose Pose Prediction

Predict poses from images.

Embed Product

Seamlessly integrate one or more products into a predefined scene with pixel-perfect control.

Emu 3.5 Image

Edit images with a text prompt using Emu 3.5 Image

Emu 3.5 Image

Generate images from text using Emu 3.5 Image

Era 3D

A powerful image to novel multiview model with normals.

EVF-SAM2 Segmentation

EVF-SAM2 combines natural language understanding with advanced segmentation capabilities, allowing you to precisely mask image regions using intuitive positive and negative text prompts.

Expression Change

Change facial expressions in photos with realistic results.

Face Retoucher

Automatically retouches faces to smooth skin and remove blemishes.

Face to Sticker

Create stickers from faces.

FASHN Virtual Try-On V1.5

FASHN v1.5 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 576x864 resolution from both on-model and flat-lay photo references.

FASHN Virtual Try-On V1.6

FASHN v1.6 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 864x1296 resolution from both on-model and flat-lay photo references.

Ffmpeg Api

ffmpeg endpoint for first, middle and last frame extraction from videos

Fibo

SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.

Fibo Bbq Preview

A preview to the next level of control of Text-to-Image models.

Fibo Edit

A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.

Fibo Edit [Add Object by Text]

Precise, context-aware insertion of new objects into an existing image using simple, structured spatial commands.

Fibo Edit [Blend]

Complex, multi-step visual composition through natural language.

Fibo Edit [Colorize]

Transforms the color treatment of images using predefined, style-based commands

Fibo Edit [Erase by Text]

Fast, reliable removal of unwanted elements from images. Designed for predictability, scale, and production use.

Fibo Edit [Relight]

Precise, controllable lighting changes using simple, structured text inputs.

Fibo Edit [Replace Object by Text]

Natural, expressive object swapping within images using plain language

Fibo Edit [Reseason]

Transforms the seasonal or weather atmosphere of an image.

Fibo Edit [Restore]

Automatically renews and cleans noisy or degraded images.

Fibo Edit [Restyle]

Transforms images into distinct artistic styles using curated, production-grade style mappings

Fibo Edit [Rewrite Text]

Precise, reliable modification of existing text inside images.

Fibo Edit [Sketch to Image]

Converts line drawings and sketches into photorealistic, fully colored images

Fibo Lite

Fibo Lite, the new addition to the Fibo model family, allows generating high-quality images with the same controllability of the JSON structured prompt with significantly improved latency.

FILM

Interpolate images with FILM - Frame Interpolation for Large Motion

finegrain eraser

Finegrain Eraser removes objects—along with their shadows, reflections, and lighting artifacts—using only natural language, seamlessly filling the scene with contextually accurate content.

finegrain eraser

Finegrain Eraser removes any object selected with a bounding box—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content.

finegrain eraser

Finegrain Eraser removes any object selected with a mask—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content.

Firered Image Edit

FireRed Image Edit is FireRed's state of the art open source editing model, re-trained from Qwen Image Edit 2509.

Firered Image Edit V1.1

FireRed Image Edit v1.1 is an updated version of FireRed Image Edit, with improved image editing capabilities.

F Lite

F Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content.

F Lite (texture mode)

F Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content. This is a high texture density variant of the model.

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Flow-Edit

The model provides you high quality image editing capabilities.

FLUX1.1 [pro]

FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.

FLUX1.1 [pro] Redux

FLUX1.1 [pro] Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX1.1 [pro] ultra

FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.

FLUX1.1 [pro] ultra Fine-tuned

FLUX1.1 [pro] ultra fine-tuned is the newest version of FLUX1.1 [pro] with a fine-tuned LoRA, maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.

FLUX1.1 [pro] ultra Redux

FLUX1.1 [pro] ultra Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [dev]

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

FLUX.1 [dev]

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

FLUX.1 [dev]

FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.

FLUX.1 [dev]

FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [dev] Canny with LoRAs

Utilize Flux.1 [dev] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.

FLUX.1 [dev] Control LoRA Canny

FLUX Control LoRA Canny is a high-performance endpoint that uses a control image using a Canny edge map to transfer structure to the generated image and another initial image to guide color.

FLUX.1 [dev] Control LoRA Canny

FLUX Control LoRA Canny is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a Canny edge map.

FLUX.1 [dev] Control LoRA Depth

FLUX Control LoRA Depth is a high-performance endpoint that uses a control image using a depth map to transfer structure to the generated image and another initial image to guide color.

FLUX.1 [dev] Control LoRA Depth

FLUX Control LoRA Depth is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a depth map.

FLUX.1 [dev] Depth with LoRAs

Generate high-quality images from depth maps using Flux.1 [dev] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization.

FLUX.1 [dev] Differential Diffusion

FLUX.1 Differential Diffusion is a rapid endpoint that enables swift, granular control over image transformations through change maps, delivering fast and precise region-specific modifications while maintaining FLUX.1 [dev]'s high-quality output.

FLUX.1 [dev] Fill with LoRAs

FLUX.1 [dev] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [dev] Inpainting with LoRAs

Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

FLUX.1 [dev] Redux

FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [dev] Redux

FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [dev] with Controlnets and Loras

A specialized FLUX endpoint combining differential diffusion control with LoRA, ControlNet, and IP-Adapter support, enabling precise, region-specific image transformations through customizable change maps.

FLUX.1 [dev] with Controlnets and Loras

FLUX General Image-to-Image is a versatile endpoint that transforms existing images with support for LoRA, ControlNet, and IP-Adapter extensions, enabling precise control over style transfer, modifications, and artistic variations through multiple guidance methods.

FLUX.1 [dev] with Controlnets and Loras

A general purpose endpoint for the FLUX.1 [dev] model, implementing the RF-Inversion pipeline. This can be used to edit a reference image based on a prompt.

FLUX.1 [dev] with Controlnets and Loras

A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.

FLUX.1 [dev] with Controlnets and Loras

FLUX General Inpainting is a versatile endpoint that enables precise image editing and completion, supporting multiple AI extensions including LoRA, ControlNet, and IP-Adapter for enhanced control over inpainting results and sophisticated image modifications.

FLUX.1 [dev] with LoRAs

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

FLUX.1 [dev] with LoRAs

FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.

FLUX.1 Kontext [dev]

Frontier image editing model.

FLUX.1 Kontext [max]

Experimental version of FLUX.1 Kontext [max] with multi image handling capabilities

FLUX.1 Kontext [max]

FLUX.1 Kontext [max] text-to-image is a new premium model brings maximum performance across all aspects – greatly improved prompt adherence.

FLUX.1 Kontext [max]

FLUX.1 Kontext [max] is a model with greatly improved prompt adherence and typography generation meet premium consistency for editing without compromise on speed.

FLUX.1 Kontext [pro]

FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.

FLUX.1 Kontext [pro]

The FLUX.1 Kontext [pro] text-to-image delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, and flawless typography.

FLUX.1 Kontext [pro]

Experimental version of FLUX.1 Kontext [pro] with multi image handling capabilities

FLUX.1 Krea [dev]

FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 Krea [dev]

FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 Krea [dev]

FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 Krea [dev]

FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 Krea [dev] Inpainting with LoRAs

Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

FLUX.1 Krea [dev] Redux

FLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 Krea [dev] Redux

FLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 Krea [dev] with LoRAs

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

FLUX.1 Krea [dev] with LoRAs

FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.

FLUX.1 [pro] Fill

FLUX.1 [pro] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [pro] Fill Fine-tuned

FLUX.1 [pro] Fill Fine-tuned is a high-performance endpoint for the FLUX.1 [pro] model with a fine-tuned LoRA that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [schnell]

Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.

FLUX.1 [schnell]

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

FLUX.1 [schnell] Redux

FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 [schnell] Redux

FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

FLUX.1 SRPO [dev]

FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 SRPO [dev]

FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 SRPO [dev]

FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 SRPO [dev]

FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.

FLUX.1 Subject

Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.

Flux 2

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Flux 2

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.

Flux 2

Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—in a flash.

Flux 2

Image-to-image editing with LoRA support for FLUX.2 [dev] from Black Forest Labs. Specialized style transfer and domain-specific modifications.

Flux 2

Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

Flux 2

Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—all at turbo speed.

Flux 2

Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.

Flux 2

Text-to-image generation with LoRA support for FLUX.2 [dev] from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

Flux 2 Flex

Image editing with FLUX.2 [flex] from Black Forest Labs. Supports multi-reference editing with customizable inference steps and enhanced text rendering.

Flux 2 Flex

Text-to-image generation with FLUX.2 [flex] from Black Forest Labs. Features adjustable inference steps and guidance scale for fine-tuned control. Enhanced typography and text rendering capabilities.

Flux 2 [klein] 4B

Text-to-image generation with Flux 2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Flux 2 [klein] 4B

Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

Flux 2 [klein] 4B Base

Text-to-image generation with Flux 2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Flux 2 [klein] 4B Base

Image-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

Flux 2 [klein] 4B Base Lora

Text-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

Flux 2 [klein] 4B Base Lora

Image-to-image editing with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.

Flux 2 [klein] 9B

Image-to-image editing with Flux 2 [klein] 9B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

FLUX.2 [klein] 9B

Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Flux 2 [klein] 9B Base

Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.

FLUX.2 [klein] 9B Base

Text-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.

Flux 2 [klein] 9B Base Lora

Image-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.

Flux 2 [klein] 9B Base Lora

Text-to-image generation with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.

Flux 2 [klein] Realtime

Realtime generation with FLUX.2 [klein] from Black Forest Labs.

Flux 2 Lora Gallery

Virtually furnishes an empty apartment

Flux 2 Lora Gallery

Applies sepia vintage effect to images

Flux 2 Lora Gallery

Virtual clothing try-on (2 images: person + garment)

Flux 2 Lora Gallery

Generates satellite/aerial view style images

Flux 2 Lora Gallery

Makes images more photorealistic and natural

Flux 2 Lora Gallery

Generates same object from different angles (azimuth/elevation)

Flux 2 Lora Gallery

HDR surrealistic effect with intense colors

Flux 2 Lora Gallery

Extends a face into a full body portrait

Flux 2 Lora Gallery

Transforms images into comic book style

Flux 2 Lora Gallery

Ballpoint pen sketch drawing style

Flux 2 Lora Gallery

Add a background to images with white/clean background

Flux 2 Max

FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.

Flux 2 Max

FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.

Flux 2 Pro

Text-to-image generation with FLUX.2 [pro] from Black Forest Labs. Optimized for maximum quality, exceptional photorealism and artistic images.

Flux 2 Pro

Image editing with FLUX.2 [pro] from Black Forest Labs. Ideal for high-quality image manipulation, style transfer, and sequential editing workflows

Flux Kontext Lora

Fast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

Flux Kontext Lora

Super fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

Flux Kontext Lora

Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.

Flux Krea Lora

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

Flux Lora

Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.

Flux Vision Upscaler

Flux Vision Upscaler for magnify/upscaling images with high fidelity and creativity.

Fooocus

Default parameters with automated optimizations and quality improvements.

Fooocus

Fooocus extreme speed mode as a standalone app.

Fooocus

Fooocus extreme speed mode as a standalone app.

Fooocus Image Prompt

Default parameters with automated optimizations and quality improvements.

Fooocus Inpainting

Default parameters with automated optimizations and quality improvements.

Fooocus Upscale or Vary

Default parameters with automated optimizations and quality improvements.

Gemini 2.5 Flash Image

Google's famous original image generation and editing model, a.k.a Nano Banana

Gemini 2.5 Flash Image

Google's famous original image generation and editing model, a.k.a Nano Banana

Gemini 3.1 Flash Image Preview

Gemini 3.1 Flash Image (a.k.a Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model

Gemini 3.1 Flash Image Preview

Gemini 3.1 Flash Image (a.k.a. Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model

Gemini 3 Pro Image Preview

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

Gemini 3 Pro Image Preview

Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model

Gemini Flash Edit Multi Image

Gemini Flash Edit Multi Image is a model that can edit multiple images using a text prompt and a reference image.

Gemini Flash Edit Multi Image

Gemini Flash Edit is a model that can edit single image using a text prompt and a reference image.

Genfocus

GenFocus Model to Refocus Images

Genfocus

GenFocus Model to Refocus Images

Ghiblify Images

Reimagine and transform your ordinary photos into enchanting Studio Ghibli style artwork

Glm Image

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

Glm Image

Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.

gpt-image-1

OpenAI's latest image generation and editing model: gpt-1-image.

gpt-image-1

OpenAI's latest image generation and editing model: gpt-1-image.

GPT-Image 1.5

GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.

GPT-Image 1.5

GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.

GPT Image 1 Mini

GPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation.

GPT Image 1 Mini

GPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation.

Grok Imagine Image

Generate highly aesthetic images with xAI's Grok Imagine Image generation model.

Grok Imagine Image

Edit images precisely with xAI's Grok Imagine model

Hair Change

Change hairstyles and hair colors in photos realistically.

Headshot Generator

Generate professional headshot photos with customizable backgrounds.

Hidream E1 1

Edit images with natural language

Hidream I1 Dev

HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Hidream I1 Fast

HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.

Hidream I1 Full

HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Hidream I1 Full

HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

Hunyuan Image

Use the amazing capabilities of hunyuan image 2.1 to generate images that express the feelings of your text.

Hunyuan Image

Leverage the state-of-the-art capabilities of Hunyuan Image 3.0 to generate visual content that effectively conveys the messaging of your written material.

Hunyuan Image

Image editing endpoint for Hunyuan Image 3.0 Instruct.

Hunyuan Image 3.0 Instruct

Instruct version of Hunyuan-Image 3.0, with internal reasoning capabilities.

Hunyuan World

Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.

IC-Light-v2 for Image Relighting

An endpoint for re-lighting photos and changing their backgrounds per a given description

Ideogram

Extend existing images with Ideogram V3's reframe feature. Create expanded versions and adaptations while preserving main image and adding new creative directions through prompt guidance.

Ideogram

Reimagine existing images with Ideogram V3's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.

Ideogram Replace Background

Replace backgrounds existing images with Ideogram V3's replace background feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.

Ideogram Text to Image

Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.

Ideogram Upscale

Ideogram Upscale enhances the resolution of the reference image by up to 2X and might enhance the reference image too. Optionally refine outputs with a prompt for guided improvements.

Ideogram V2

Generate high-quality images, posters, and logos with Ideogram V2. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.

Ideogram V2A

Generate high-quality images, posters, and logos with Ideogram V2A. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.

Ideogram V2A Remix

Create variations of existing images with Ideogram V2A Remix while maintaining creative control through prompt guidance.

Ideogram V2A Turbo

Accelerated image generation with Ideogram V2A Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.

Ideogram V2A Turbo Remix

Rapidly create image variations with Ideogram V2A Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance.

Ideogram V2 Edit

Transform existing images with Ideogram V2's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.

Ideogram V2 Remix

Reimagine existing images with Ideogram V2's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.

Ideogram V2 Turbo

Accelerated image generation with Ideogram V2 Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.

Ideogram V2 Turbo Edit

Edit images faster with Ideogram V2 Turbo. Quick modifications and adjustments while preserving the high-quality standards and realistic outputs of Ideogram.

Ideogram V2 Turbo Remix

Rapidly create image variations with Ideogram V2 Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance.

Ideogram V3 Character

Generate consistent character appearances across multiple images. Maintain facial features, proportions, and distinctive traits for cohesive storytelling and branding

Ideogram V3 Character Edit

Modify consistent characters while preserving their core identity. Edit poses, expressions, or clothing without losing recognizable character features

Ideogram V3 Character Remix

Transform your consistent character into different art styles, settings, or scenarios while maintaining their distinctive appearance and identity

Ideogram V3 Edit

Transform existing images with Ideogram V3's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.

Illusion Diffusion

Create illusions conditioned on image.

Image2Pixel

Turn images into pixel-perfect retro art

Image2svg

Image2SVG transforms raster images into clean vector graphics, preserving visual quality while enabling scalable, customizable SVG outputs with precise control over detail levels.

Image Editing

The reframe endpoint intelligently adjusts an image's aspect ratio while preserving the main subject's position, composition, pose, and perspective

Image Editing

Transform any person into their baby version, while preserving the original pose and expression with childlike features.

Image Editing

Add realistic weather effects like snowfall, rain, or fog to your photos while maintaining the scene's mood.

Image Editing

Transform your photos to any time of day, from golden hour to midnight, with appropriate lighting and atmosphere.

Image Editing

Remove unwanted objects or people from your photos while seamlessly blending the background.

Image Editing

Turn your casual photos into stunning professional studio portraits with perfect lighting and high-end photography style.

Image Editing

Place your subject in any scene you imagine, from enchanted forests to urban settings, with professional composition and lighting

Image Editing

Restore and enhance old or damaged photos by removing imperfections, adding color while preserving the original character and details of the image.

Image Editing

Retouch photos of faces. Remove blemishes and improve the skin.

Image Editing

Perfect your photos with professional color grading, balanced tones, and vibrant yet natural colors

Image Editing

Change facial expressions in photos to any emotion you desire, from smiles to serious looks.

Image Editing

Transform your photos into vibrant cool cartoons with bold outlines and rich colors.

Image Editing

Enhance facial features with professional retouching while maintaining a natural, realistic look

Image Editing

Replace your photo's background with any scene you desire, from beach sunsets to urban landscapes, with perfect lighting and shadows

Image Editing

Experiment with different hairstyles, from bald to any style you can imagine, while maintaining natural lighting and realistic results.

Image Editing

See how you or others might look at different ages, from younger to older, while preserving core facial features.

Image Editing

Transform your photos into cool plushies while keeping the original characters likeness

Image Editing

Transform your photos into wojak style while keeping the original characters likeness

Image Editing

Transform your character's hair into broccoli style while keeping the original characters likeness

Image Editing

Generate YouTube thumbnails with custom text

Image Editing

Add details to faces, enhance face features, remove blur.

Image Editing

Remove all text and writing from images while preserving the background and natural appearance.

Image Editing

Transform your photos into artistic masterpieces inspired by famous styles like Van Gogh's Starry Night or any artistic style you choose.

Imagen3

Imagen3 is a high-quality text-to-image model that generates realistic images from text prompts.

Imagen3 Fast

Imagen3 Fast is a high-quality text-to-image model that generates realistic images from text prompts.

Imagen 4

Google’s highest quality image generation model

Imagen 4

Google’s highest quality image generation model

Imagen 4 Ultra

Google’s highest quality image generation model

Image Outpaint

Directional outpainting. Choose edges to expand. left, right, top, or center (uniform all sides). Only expanded areas are generated; an optional zoom-out pulls the frame back by the chosen amount.

Image Preprocessors

Holistically-Nested Edge Detection (HED) preprocessor.

Image Preprocessors

Scribble preprocessor.

Image Preprocessors

M-LSD line segment detection preprocessor.

Image Preprocessors

Segment Anything Model (SAM) preprocessor.

Image Preprocessors

MiDaS depth estimation preprocessor.

Image Preprocessors

TEED (Temporal Edge Enhancement Detection) preprocessor.

Image Preprocessors

Line art preprocessor.

Image Preprocessors

ZoeDepth preprocessor.

Image Preprocessors

PIDI (Pidinet) preprocessor.

Image Preprocessors

Depth Anything v2 preprocessor.

Imagineart 1.5 Preview

ImagineArt 1.5 text-to-image model generates high-fidelity professional-grade visuals with lifelike realism, strong aesthetics, and text that actually reads correctly.

ImagineArt 1.5 Pro Preview

ImagineArt 1.5 Pro is an advanced text-to-image model that creates ultra-high-fidelity 4K visuals with lifelike realism, refined aesthetics, and powerful creative output suited for professional use.

Inpainting sdxl and sd

Inpaint images with SD and SDXL

Instant Character

InstantCharacter creates high-quality, consistent characters from text prompts, supporting diverse poses, styles, and appearances with strong identity control.

Invisible Watermark

Invisible Watermark is a model that can add an invisible watermark to an image.

IP Adapter Face ID

High quality zero-shot personalization

Juggernaut Flux Base

Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.

Juggernaut Flux Base

Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.

Juggernaut Flux Base LoRA

Juggernaut Base Flux LoRA by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.

Juggernaut Flux Lightning

Juggernaut Lightning Flux by RunDiffusion provides blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence.

Juggernaut Flux Lora

Juggernaut Base Flux LoRA Inpainting by RunDiffusion is a drop-in replacement for Flux [Dev] inpainting that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.

Juggernaut Flux Pro

Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.

Juggernaut Flux Pro

Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.

Kling Image

Kling Omni 3: Top-tier image-to-image with flawless consistency.

Kling Image

Kling Image V3: Latest kling image model

Kling Image

Kling V3: Latest Kling Image model

Kling Image

Kling Omni 3: Top-tier text-to-image with flawless consistency.

Kling Kolors Virtual TryOn v1.5

Kling Kolors Virtual TryOn v1.5 is a high quality image based Try-On endpoint which can be used for commercial try on.

Kling O1 Image

Perform precise image edits using strong reference control, transforming subjects, styles, and local details while preserving visual consistency.

Kolors

Photorealistic Text-to-Image

Kolors Image to Image

Photorealistic Image-to-Image

Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

Latent Consistency Models (v1.5/XL)

Run SDXL at the speed of light

Latent Consistency (SDXL & SDv1.5)

Produce high-quality images with minimal inference steps.

Layer Diffusion XL

SDXL with an alpha channel.

Leffa Pose Transfer

Leffa Pose Transfer is an endpoint for changing pose of an image with a reference image.

Leffa Virtual TryOn

Leffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on.

Lightning Models

Collection of SDXL Lightning models.

Live Portrait

Transfer expression from a video to a portrait.

Longcat Image

LongCat image Edit is a 6B parameter image editing model excelling at multilingual text rendering, photorealism and deployment efficiency.

Longcat Image

LongCat image is a 6B parameter model excelling at multilingual text rendering, photorealism and deployment efficiency.

Lucidflux

LucidFlux for upscaling images with very high fidelity

Luma Photon

Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

Luma Photon

Generate images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

Luma Photon

Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

Luma Photon Flash

Generate images from your prompts using Luma Photon Flash. Photon Flash is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.

Luma Photon Flash Reframe

This advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched speed and quality for creators at a fraction of the cost.

Luma Photon Reframe

Extend and reframe images with Luma Photon Reframe. This advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched personalization and quality for creators at a fraction of the cost.

Lumina Image 2

Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Makeup Changer

Apply realistic makeup styles with adjustable intensity.

Marigold Depth Estimation

Create depth maps using Marigold depth estimation.

Midas Depth Estimation

Create depth maps using Midas depth estimation.

MiniMax (Hailuo AI) Text to Image

Generate high quality images from text prompts using MiniMax Image-01. Longer text prompts will result in better quality images.

Minimax Image Subject Reference

Generate images from text and a reference image using MiniMax Image-01 for consistent character appearance.

MixDehazer

An advanced dehaze model to remove atmospheric haze, restoring clarity and detail in images through intelligent neural network processing.

Moondream3 Preview [Segment]

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

MoonDreamNext Detection

MoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more.

NAFNet-deblur

Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.

NAFNet-denoise

Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.

Nano Banana

Google's famous original image generation and editing model

Nano Banana

Google's famous original image generation and editing model

Nano Banana 2

Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model

Nano Banana 2

Nano Banana 2 is Google's new state-of-the-art image generation and editing model

Nano Banana Pro

Nano Banana Pro is Google's new state-of-the-art image generation and editing model

Nano Banana Pro

Nano Banana Pro is Google's new state-of-the-art image generation and editing model

Nextstep 1

Endpoint for NextStep-1 Autoregressive Image Editing model.

Object Removal

Removes box-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.

Object Removal

Removes mask-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.

Object Removal

Remove unwanted objects seamlessly from any image.

Object Removal

Removes objects and their visual effects using natural language, replacing them with contextually appropriate content

OmniGen v1

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!

Omnigen V2

OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!

Omni Zero

Any pose, any style, any identity

Onereward

OneReward is a finetuned version of Flux 1.0 Fill with intelligent editing capabilities.

Optimized Latent Consistency (SDv1.5)

Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.

Ovis Image

Ovis-Image is a 7B text-to-image model specifically optimized for quick, high quality text rendering.

PASD

Pixel-Aware Diffusion Model for Realistic Image Super-Resolution and Personalized Stylization

Perspective Change

Easily adjust the perspective of any image to different angles.

Photography Effects

Apply diverse photography styles and effects to transform your images.

PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding

Photo Restoration

Restore old or damaged photos by fixing colors, scratches, and resolution.

Piflow

Use the faster speed of piflow to generate images with same quality to that of slower models.

PixArt-Σ

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Playground v2.5

State-of-the-art open-source model in aesthetic quality

Playground v2.5

State-of-the-art open-source model in aesthetic quality

Playground v2.5

State-of-the-art open-source model in aesthetic quality

Plushify

Turn any image into a cute plushie!

Pony V7

Pony V7 is a finetuned text to image for superior aesthetics and prompt following.

Portrait Enhance

Enhance and refine portrait photos with improved clarity and detail.

Post Processing

Adjust color temperature, brightness, contrast, saturation, and gamma values for color correction.

Post Processing

Apply Gaussian or Kuwahara blur effects with adjustable radius and sigma parameters

Post Processing

Create chromatic aberration by shifting red, green, and blue channels horizontally or vertically with customizable shift amounts.

Post Processing

Apply various color tints (sepia, red, green, blue, cyan, magenta, yellow, purple, orange, warm, cool, lime, navy, vintage, rose, teal, maroon, peach, lavender, olive) with adjustable strength.

Post Processing

Reduce color saturation using different methods (luminance Rec.709, luminance Rec.601, average, lightness) with adjustable factor.

Post Processing

Blend two images together using smooth linear interpolation with a configurable blend factor.

Post Processing

Apply dodge and burn effects with multiple modes and adjustable intensity.

Post Processing

Apply film grain effect with different styles (modern, analog, kodak, fuji, cinematic, newspaper) and customizable intensity and scale

Post Processing

Apply a parabolic distortion effect with configurable coefficient and vertex position.

Post Processing

Apply sharpening effects with three modes: basic unsharp mask, smart sharpening with edge preservation, and Contrast Adaptive Sharpening (CAS).

Post Processing

Apply solarization effect by inverting pixel values above a threshold

Post Processing

Add a darkening vignette effect around the edges of the image with adjustable strength

Post Processing

Post Processing is an endpoint that can enhance images using a variety of techniques including grain, blur, sharpen, and more.

Product Holding

Place products naturally in a person’s hands for realistic marketing visuals.

Product Photography

Generate professional product photography with realistic lighting and backgrounds.

PuLID

Tuning-free ID customization.

PuLID Flux

An endpoint for personalized image generation using Flux as per given description.

Qwen Image

Qwen-Image (Image-to-Image) transforms and edits input images with high fidelity, enabling precise style transfer, enhancement, and creative modification.

Qwen Image

Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.

Qwen Image 2

Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model

Qwen Image 2

Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model

Qwen Image 2

Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model

Qwen Image 2

Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model

Qwen Image 2512

Qwen Image 2512 is an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.

Qwen Image 2512

LoRA inference endpoint for Qwen Image 2512, an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.

Qwen Image Edit

Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.

Qwen Image Edit

Image to Image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.

Qwen Image Edit

Inpainting Endpoint for the Qwen Edit Image editing model.

Qwen Image Edit 2509

Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support.

Qwen Image Edit 2509 Lora

LoRA endpoint for the Qwen Image Edit 2509 model.

Qwen Image Edit 2509 Lora Gallery

Generate full portrait from a cropped face photo

Qwen Image Edit 2509 Lora Gallery

Add a realistic scene behind the object with white background

Qwen Image Edit 2509 Lora Gallery

Remove unwanted elements (objects, people, text) while maintaining image consistency

Qwen Image Edit 2509 Lora Gallery

Blend products into backgrounds with automatic perspective and lighting correction

Qwen Image Edit 2509 Lora Gallery

Remove existing lighting and apply soft, even illumination

Qwen Image Edit 2509 Lora Gallery

Create group photos

Qwen Image Edit 2509 Lora Gallery

Apply designs/graphics onto people's shirts

Qwen Image Edit 2509 Lora Gallery

Create cinematic transitions and scene progressions (camera movements, framing changes)

Qwen Image Edit 2509 Lora Gallery

Precise camera position and angle control (rotation, zoom, vertical movement)

Qwen Image Edit 2509 Lora Gallery

Removes harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination.

Qwen Image Edit 2511

Endpoint for Qwen's Image Editing 2511 model with LoRa support.

Qwen Image Edit 2511

Endpoint for Qwen's Image Editing 2511 model.

Qwen Image Edit 2511 Multiple Angles

Generates same scene from different angles (azimuth/elevation) with Qwen image Edit 2511 and the Lora Multiple Angles

Qwen Image Edit Lora

LoRA inference endpoint for the Qwen Image Editing model.

Qwen Image Edit Plus

Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support.

Qwen Image Edit Plus Lora

LoRA endpoint for the Qwen Image Edit Plus model.

Qwen Image Edit Plus Lora Gallery

Add a realistic scene behind the object with white background

Qwen Image Edit Plus Lora Gallery

Generate full portrait from a cropped face photo

Qwen Image Edit Plus Lora Gallery

Create group photos

Qwen Image Edit Plus Lora Gallery

Blend products into backgrounds with automatic perspective and lighting correction

Qwen Image Edit Plus Lora Gallery

Create cinematic transitions and scene progressions (camera movements, framing changes)

Qwen Image Edit Plus Lora Gallery

Remove unwanted elements (objects, people, text) while maintaining image consistency

Qwen Image Edit Plus Lora Gallery

Remove existing lighting and apply soft, even illumination

Qwen Image Edit Plus Lora Gallery

Apply designs/graphics onto people's shirts

Qwen Image Edit Plus Lora Gallery

Precise camera position and angle control (rotation, zoom, vertical movement)

Qwen Image Edit Plus Lora Gallery

Removes harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination.

Qwen Image Layered

Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers.

Qwen Image Layered

Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. Use loras to get your custom outputs.

Qwen Image Max

Text-to-Image endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.

Qwen Image Max

Image editing endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.

Realistic Vision

Generate realistic images.

Recraft

Converts a given raster image to SVG format using Recraft model.

Recraft 20b

Recraft 20b is a new and affordable text-to-image model.

Recraft Creative Upscale

Enhances a given raster image using the 'creative upscale' tool, increasing image resolution, making the image sharper and cleaner.

Recraft Crisp Upscale

Enhances a given raster image using 'crisp upscale' tool, boosting resolution with a focus on refining small details and faces.

Recraft V3

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

Recraft V3

Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.

Recraft V4

Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.

Recraft V4 Pro

Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.

Recraft V4 Pro (Vector)

Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.

Recraft V4 (Vector)

Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.

Reimagine

Reimagine uses a structure reference for generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data

Relighting

Adjust and enhance images with different lighting styles.

Rembg Enhance (Remove Background Enhance)

Rembg-enhance is optimized for 2D vector images, 3D graphics, and photos by leveraging matting technology.

Remove Background

Remove the background from an image.

Replace Background

Creates enriched product shots by placing them in various environments using textual descriptions.

Reve

Reve’s edit model lets you upload an existing image and then transform it via a text prompt

Reve

Reve’s text-to-image model generates detailed visual output that closely follow your instructions, with strong aesthetic quality and accurate text rendering.

Reve

Reve’s remix model lets you upload an reference images and then combine/transform them via a text prompt

Reve

Reve’s fast remix model lets you upload an reference images and then combine/transform them via a text prompt at lightning speed!

Reve

Reve’s fast edit model lets you upload an existing image and then transform it via a text prompt at lightning speed!

RIFE

Interpolate images with RIFE - Real-Time Intermediate Flow Estimation

Rundiffusion Photo Flux

RunDiffusion Photo Flux provides insane realism. With this enhancer, textures and skin details burst to life, turning your favorite prompts into vivid, lifelike creations. Recommended to keep it at 0.65 to 0.80 weight. Supports resolutions up to 1536x1536.

Sam 3

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

Sana

Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, with the ability to generate 4K images in less than a second.

Sana Sprint

Sana Sprint is a text-to-image model capable of generating 4K images with exceptional speed.

Sana v1.5 1.6B

Sana v1.5 1.6B is a lightweight text-to-image model that delivers 4K image generation with impressive efficiency.

Sana v1.5 4.8B

Sana v1.5 4.8B is a powerful text-to-image model that generates ultra-high quality 4K images with remarkable detail.

SD 1.5 Depth ControlNet

SD 1.5 ControlNet

SDXL ControlNet Union

An efficent SDXL multi-controlnet text-to-image model.

SDXL ControlNet Union

An efficent SDXL multi-controlnet image-to-image model.

SDXL ControlNet Union

An efficent SDXL multi-controlnet inpainting model.

SeedVR2

Use SeedVR2 to upscale your images

Segment Anything Model 2

SAM 2 is a model for segmenting images and videos in real-time.

Segment Anything Model 2

SAM 2 is a model for segmenting images automatically. It can return individual masks or a single mask for the entire image.

Segment Anything Model 3

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

Sky Raccoon

Generate images from a text prompt.

SoteDiffusion

Anime finetune of Würstchen V3.

Stable Cascade

Stable Cascade: Image generation on a smaller & cheaper latent space.

Stable Diffusion 3.5 Large

Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Stable Diffusion 3.5 Medium

Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Stable Diffusion v1.5

Stable Diffusion v1.5

Stable Diffusion V3

Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.

Stable Diffusion V3

Stable Diffusion 3 Medium (Text to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.

Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

Stable Diffusion with LoRAs

Run Any Stable Diffusion model with customizable LoRA weights.

Stable Diffusion XL

Run SDXL at the speed of light

Stable Diffusion XL

Run SDXL at the speed of light

Stable Diffusion XL

Run SDXL at the speed of light

Stable Diffusion XL Lightning

Run SDXL at the speed of light

Stable Diffusion XL Lightning

Run SDXL at the speed of light

Stable Diffusion XL Lightning

Run SDXL at the speed of light

StarVector

AI vectorization model that transforms raster images into scalable SVG graphics, preserving visual details while enabling infinite scaling and easy editing capabilities.

Step1X Edit

Step1X-Edit transforms your photos with simple instructions into stunning, professional-quality edits—rivaling top proprietary tools.

Stepx Edit2

Image-to-image editing with Step1X-Edit v2 from StepFun. Reasoning-enhanced modifications through a thinking–editing–reflection loop with MLLM world knowledge for abstract instruction comprehension.

Style Transfer

Apply artistic styles like impressionism, cubism, or surrealism to your images.

SWIN2SR

Enhance low-resolution images with the superior quality of Swin2SR for sharper, clearer results.

Switti 1024

Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.

Switti 512

Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.

Texture Transform

Transform objects with different surface textures like marble, wood, or fabric.

Thera

Fix low resolution images with fast speed and quality of thera.

Topaz

Use the powerful and accurate topaz image enhancer to enhance your images.

try-on

Image based high quality Virtual Try-On

Uno

An AI model that transforms input images into new ones based on text prompts, blending reference visuals with your creative directions.

Upscale

Regenerate the image with sharper textures and richer details while upscaling resolution to 4 megapixel.

Upscale Images

Upscale images by a given factor.

Uso

Use USO to perform subject driven generations using reference image.

Vidu

Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.

Vidu

Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.

Vidu

Use vidu Text-to-Image to turn your prompts into reality.

Virtual Try-on

Try on clothes virtually by combining person and clothing images.

Wan

Wan 2.2's 5B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Wan

Wan 2.2's 14B model edit high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Wan

Wan 2.2's 14B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail

Wan 2.5 Image to Image

Wan 2.5 image-to-image model.

Wan 2.5 Text to Image

Wan 2.5 text-to-image model.

Wan v2.2 A14B Text-to-Image A14B with LoRAs

Wan 2.2's 14B model with LoRA support generates high-fidelity images with enhanced prompt alignment, style adaptability.

Wan v2.6 Image to Image

Wan 2.6 image-to-image model.

Wan v2.6 Text to Image

Wan 2.6 text-to-image model.

Workflow Utilities

FFMPEG Untility for Extracting nth Frame

Z Image Base

Z-Image is the foundation model of the Z- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence.

Z Image Base (LoRA)

LoRA endpoint for Z-Image, the foundation model of the Z- Image family.

Z-Image Turbo

Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Z-Image Turbo

Generate images from text, an image and a mask using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Z-Image Turbo

Text-to-Image endpoint with LoRA support for Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.

Z-Image Turbo

Generate images from text and edge, depth or pose images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Z-Image Turbo

Generate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Z-Image Turbo

Generate images from text and images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Z-Image Turbo

Generate images from text and images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Z-Image Turbo

Generate images from text, an image, a mask and custom LoRA using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.

Ai Detector

AI Detector (Image) is an advanced service that analyzes a single picture and returns a verdict on whether it was likely created by AI.

Arbiter

Image reference comparison measurements

Arbiter

Semantic image alignment measurements

Arbiter

Reference-free image measurements

Bagel

Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both text and images.

Ffmpeg Api

Get EBU R128 loudness normalization from audio files using FFmpeg API.

FFmpeg API Metadata

Get encoding metadata from video and audio files using FFmpeg API.

FFmpeg API Waveform

Get waveform data from audio files using FFmpeg API.

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

Florence-2 Large

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks

GOT OCR 2.0

GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music.

Isaac 0.1

Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.

LLaVA v1.6 34B

Vision

Moondream

Answer questions from the images.

Moondream2

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

Moondream2

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

Moondream2

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

Moondream2

Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.

Moondream3 Preview [Caption]

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Moondream3 Preview [Detect]

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Moondream3 Preview [Point]

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

Moondream 3 Preview [Query]

Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.

MoonDreamNext

MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.

MoonDreamNext Batch

MoonDreamNext Batch is a multimodal vision-language model for batch captioning.

NSFW Checker

Predict whether an image is NSFW or SFW.

NSFW Filter

Predict the probability of an image being NSFW.

OpenRouter

Run any LLM (Large Language Model) with fal, powered by OpenRouter.

OpenRouter [Audio]

Run any ALM (Audio Language Model) with fal, powered by OpenRouter.

OpenRouter [Vision]

Run any VLM (Vision Language Model) with fal, powered by OpenRouter.

Qwen 3 Guard [8B]

Use Qwen 3 Guard [8B] to detect and classify text as safe or harmful, delivering precise and reliable safety categorization.

Sa2VA 4B Image

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

Sa2VA 4B Video

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

Sa2VA 8B Image

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

Sa2VA 8B Video

Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels

Sam 3

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

Video Prompt Generator

Generate video prompts using a variety of techniques including camera direction, style, pacing, special effects and more.

Video Understanding

A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.

Workflow Utilities

ffmpeg utility to interleave videos

Ai Avatar

MultiTalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Ai Avatar

MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.

Ai Avatar

MultiTalk model generates a multi-person conversation video from an image and text inputs. Converts text to speech for each person, generating a realistic conversation scene.

Ai Avatar

MultiTalk model generates a multi-person conversation video from an image and audio files. Creates a realistic scene where multiple people speak in sequence.

Ai Face Swap

AI-FaceSwap-Video is a service that can replace a person's face throughout a video clip while keeping their movements natural.

AMT Frame Interpolation

Interpolate between image frames

AMT Interpolation

Interpolate between video frames

AnimateDiff

Re-animate your videos!

AnimateDiff

Animate your ideas!

Animatediff SparseCtrl LCM

Animate Your Drawings with Latent Consistency Models!

AnimateDiff Turbo

Animate your ideas in lightning speed!

AnimateDiff Turbo

Re-animate your videos in lightning speed!

Auto-Captioner

Automatically generates text captions for your videos from the audio as per text colour/font specifications

Avatars

Generate high-quality videos with UGC-like avatars from text

Avatars

Generate high-quality videos with UGC-like avatars from audio

Avatars Audio to Video

High-quality avatar videos that feel real, generated from your audio

Avatars Text to Video

High-quality avatar videos that feel real, generated from your text

Ben-Video-Bg-Rm

A model for high quality and smooth background removal for videos.

Birefnet

Video background removal version of bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)

Bria Video Eraser

A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency

Bria Video Eraser

A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.

Bria Video Eraser

A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.

Bytedance

Image to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost

Bytedance

Transform your images into stylized videos using this workflow.

Bytedance

Generate videos with audio with Seedance 1.5 (supports start & end frame)

Bytedance

Generate videos with audio with Seedance 1.5

Bytedance

Seedance lite reference-to-video allows the use of 1 to 4 images as reference to create a high-quality video.

Bytedance

Transfer motion from a video to characters in an image using Dreamactor v2. Great performance for non-human and multiple characters

Bytedance

Text to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost

Bytedance OmniHuman v1.5

Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

Bytedance Upscaler

Upscale videos with Bytedance's video upscaler.

CogVideoX-5B

Generate videos from images and prompts using CogVideoX-5B

CogVideoX-5B

Generate videos from videos and prompts using CogVideoX-5B

CogVideoX-5B

Generate videos from prompts using CogVideoX-5B

ControlNeXt SVD

Animate a reference image with a driving video using ControlNeXt.

Cosmos Predict 2.5 2B

Generate video from text and videos using NVIDIA's 2B Cosmos Post-Trained Model

Cosmos Predict 2.5 2B

Generate video from text and images using NVIDIA's 2B Cosmos Post-Trained Model

Cosmos Predict 2.5 2B

Generate video from text using NVIDIA's 2B Cosmos Post-Trained Model

Cosmos Predict 2.5 2B Distilled

Generate video from text and videos using NVIDIA's 2B Cosmos Distilled Model

Creatify Aurora

Generate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team!

Crystal Upscaler [Video]

Do high precision video upscaling that respects the original video perfectly using Crystal Upscaler's new video upscaling method!

Decart

Lucy-5B is a model that can create 5-second I2V videos in under 5 seconds, achieving >1x RTF end-to-end

Decart Lucy 14b

Lucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI

Depth Anything Video

Generates depth maps from video using Video Depth Anything (CVPR 2025). Produces per-frame depth estimation with temporal consistency across frames. Supports 3 model sizes (Small, Base, Large), 5 colormaps including grayscale, side-by-side comparison with the original video, and raw depth export as .npz. Useful for 3D reconstruction, video effects, compositing, and scene understanding.

Dubbing

This endpoint delivers seamlessly localized videos by generating lip-synced dubs in multiple languages, ensuring natural and immersive multilingual experiences

DWPose Pose Prediction

Predict poses from videos.

EchoMimic V3

EchoMimic V3 generates a talking avatar model from a picture, audio and text prompt.

Editto

Edit videos using instruction-based prompting using Editto model!

ElevenLabs Dubbing

Generate dubbed videos or audios using ElevenLabs Dubbing feature!

Fabric 1.0

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

Fabric 1.0

VEED Fabric 1.0 text-to-video API

Fabric 1.0 Fast

VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video

Ffmpeg Api

Use ffmpeg capabilities to merge 2 or more videos.

FFmpeg API Compose

Compose videos from multiple media sources using FFmpeg API.

Ffmpeg Api Merge Audio-Video

Merge videos with standalone audio files or audio from video files.

FILM

Interpolate videos with FILM - Frame Interpolation for Large Motion

Flashvsr

Upscale your videos using FlashVSR with the fastest speeds!

Framepack

Framepack is an efficient Image-to-video model that autoregressively generates videos.

Framepack

Framepack is an efficient Image-to-video model that autoregressively generates videos.

Framepack F1

Framepack is an efficient Image-to-video model that autoregressively generates videos.

Grok Imagine Video

Generate videos from images with audio using xAI's Grok Imagine Video model.

Grok Imagine Video

Generate videos with audio from text using Grok Imagine Video.

Grok Imagine Video

Edit videos using xAI's Grok Imagine

Heygen

Heygen Avatar V3 Model for Digital Twin

Heygen

Heygen Translate Model with Extreme Speed

Heygen

Heygen Avatar 4 Digital Twin Model

Heygen

Heygen Translate Model with Extreme Precision

Heygen

Heygen Text to Video Generation Model

Heygen

Heygen Photo Avatar 4 Model

High Quality Stable Video Diffusion

Generate short video clips from your images using SVD v1.1

Hunyuan Avatar

HunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters .

Hunyuan Custom

HunyuanCustom revolutionizes video generation with unmatched identity consistency across multiple input types. Its innovative fusion modules and alignment networks outperform competitors, maintaining subject integrity while responding flexibly to text, image, audio, and video conditions.

Hunyuan Portrait

HunyuanPortrait is a diffusion-based framework for generating lifelike, temporally consistent portrait animations.

Hunyuan Video

Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions.

Hunyuan Video Foley

Use the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them.

Hunyuan Video Image-to-Video Inference

Image to Video for the high-quality Hunyuan Video I2V model.

Hunyuan Video Image-to-Video LoRA Inference

Image to Video for the Hunyuan Video model using a custom trained LoRA.

Hunyuan Video LoRA Inference

Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability

Hunyuan Video LoRA Inference (Video-to-Video)

Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos.

Hunyuan Video V1.5

Hunyuan Video 1.5 is Tencent's latest and best video model

Hunyuan Video V1.5

Hunyuan Video 1.5 is Tencent's latest and best video model

Hunyuan Video (Video-to-Video)

Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos.

Infinitalk

Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Infinitalk

Infinitalk model generates a talking avatar video from a text and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Infinitalk

Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.

Infinity Star

InfinityStar’s unified 8B spacetime autoregressive engine to turn any text prompt into crisp 720p videos - 10× faster than diffusion models.

Kandinsky5

Kandinsky 5.0 Distilled is a lightweight diffusion model for fast, high-quality text-to-video generation.

Kandinsky5

Kandinsky 5.0 is a diffusion model for fast, high-quality text-to-video generation.

Kandinsky5 Pro

Kandinsky 5.0 Pro is a diffusion model for fast, high-quality text-to-video generation.

Kandinsky5 Pro

Kandinsky 5.0 Pro is a diffusion model for fast, high-quality image-to-video generation.

Kling 1.0

Generate video clips from your images using Kling 1.0

Kling 1.0

Generate video clips from your prompts using Kling 1.0

Kling 1.0

Generate video clips from your prompts using Kling 1.0

Kling 1.5

Generate video clips from your prompts using Kling 1.5 (pro)

Kling 1.5

Generate video clips from your images using Kling 1.5 (pro)

Kling 1.5

Generate video clips from your prompts using Kling 1.5 (pro)

Kling 1.6

Generate video clips from your prompts using Kling 1.6 (pro)

Kling 1.6

Generate video clips from your prompts using Kling 1.6 (pro)

Kling 1.6

Generate video clips from your prompts using Kling 1.6 (std)

Kling 1.6

Generate video clips from your prompts using Kling 1.6 (std)

Kling 1.6

Generate video clips from your images using Kling 1.6 (pro)

Kling 1.6

Generate video clips from your images using Kling 1.6 (std)

Kling 1.6 Elements

Generate video clips from your multiple image references using Kling 1.6 (standard)

Kling 1.6 Elements

Generate video clips from your multiple image references using Kling 1.6 (pro)

Kling 2.0 Master

Generate video clips from your prompts using Kling 2.0 Master

Kling 2.0 Master

Generate video clips from your images using Kling 2.0 Master

Kling 2.1 Master

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

Kling 2.1 Master

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

Kling 2.1 (pro)

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

Kling 2.1 (standard)

Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation

Kling AI Avatar

Kling AI Avatar Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Kling AI Avatar Pro

Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Kling AI Avatar v2 Pro

Kling AI Avatar v2 Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Kling AI Avatar v2 Standard

Kling AI Avatar v2 Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters

Kling LipSync Audio-to-Video

Kling LipSync is an audio-to-video model that generates realistic lip movements from audio input.

Kling LipSync Text-to-Video

Kling LipSync is a text-to-video model that generates realistic lip movements from text input.

Kling O1 Edit Video [Pro]

Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure.

Kling O1 Edit Video [Standard]

Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure.

Kling O1 First Frame Last Frame to Video [Pro]

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Kling O1 First Frame Last Frame to Video [Standard]

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Kling O1 Reference Image to Video [Pro]

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

Kling O1 Reference Image to Video [Standard]

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

Kling O1 Reference Video to Video [Pro]

Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

Kling O1 Reference Video to Video [Standard]

Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

Kling O3 Edit Video [Pro]

Edit videos using Kling O3 from Kling Team!

Kling O3 Edit Video [Standard]

Edit videos using Kling O3 from Kling Team!

Kling O3 Image to Video [Pro]

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Kling O3 Image to Video [Pro]

Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.

Kling O3 Reference to Video [Pro]

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

Kling O3 Reference to Video [Standard]

Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.

Kling O3 Reference Video to Video [Pro]

Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

Kling O3 Reference Video to Video [Standard]

Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.

Kling O3 Text to Video [Pro]

Generate realistic videos using Kling O3 from Kling Team!

Kling O3 Text to Video [Standard]

Generate realistic videos using Kling O3 from Kling Team!

Kling v2.5 Text to Video

Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

Kling Video

Kling 2.5 Turbo Standard: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

Kling Video

Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.

Kling Video

Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.

Kling Video

Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

Kling Video v2.6 Image to Video

Kling 2.6 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation.

Kling Video v2.6 Motion Control [Pro]

Transfer movements from a reference video to any character image. Pro mode delivers higher quality output, ideal for complex dance moves and gestures.

Kling Video v2.6 Motion Control [Standard]

Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.

Kling Video v2.6 Text to Video

Kling 2.6 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation.

Kling Video v3 Image to Video [Pro]

Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

Kling Video v3 Image to Video [Standard]

Kling 3.0 Standard: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.

Kling Video v3 Text to Video [Pro]

Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

Kling Video v3 Text to Video [Standard]

Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.

Krea Wan 14B

Superfast video model based on Wan 2.1 14b by Krea, excelling at real-time video-editing.

Krea Wan 14b- Text to Video

Fast Text-to-Video endpoint for Krea's Wan 14b model.

LatentSync

LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.

Lightx

Use tlightx capabilities to relight and recamera your videos.

Lightx

Use the capabilities of lightx to relight and recamera your videos.

Lipsync

Generate realistic lipsync from any audio using VEED's latest model

Live Avatar

Real-time avatar generation with Live Avatar. Have natural face-to-face conversations with AI avatars that respond instantly—streaming infinite-length video with immediate visual feedback.

Live Portrait

Transfer expression from a video to a portrait.

Longcat Multi Avatar

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

Longcat Single Avatar

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

Longcat Single Avatar

LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.

LongCat Video

Generate long videos in 720p/30fps from text using LongCat Video

LongCat Video

Generate long videos in 720p/30fps from images using LongCat Video

LongCat Video

Generate long videos from images using LongCat Video

LongCat Video

Generate long videos from text using LongCat Video

LongCat Video Distilled

Generate long videos from text using LongCat Video Distilled

LongCat Video Distilled

Generate long videos from images using LongCat Video Distilled

LongCat Video Distilled

Generate long videos in 720p/30fps from text using LongCat Video Distilled

LongCat Video Distilled

Generate long videos in 720p/30fps from images using LongCat Video Distilled

LTX 2.0 Video Pro

Generate video from audio using LTX-2

LTX-2 19B

Generate video with audio from text using LTX-2 and custom LoRA

LTX-2 19B

Generate video with audio from images using LTX-2

LTX-2 19B

Generate video with audio from text using LTX-2

LTX-2 19B

Extend video with audio using LTX-2

LTX-2 19B

Generate video with audio from images using LTX-2 and custom LoRA

LTX-2 19B

Extend video with audio using LTX-2 and custom LoRA

LTX-2 19B

Generate video with audio from videos using LTX-2

LTX-2 19B

Generate video with audio from videos using LTX-2 and custom LoRA

LTX-2 19B

Generate video with audio from audio, text and images using LTX-2

LTX-2 19B

Generate video with audio from audio, text and images using LTX-2 and custom LoRA

LTX-2 19B Distilled

Generate video with audio from images using LTX-2 Distilled and custom LoRA

LTX-2 19B Distilled

Generate video with audio from audio, text and images using LTX-2 Distilled and custom LoRA

LTX-2 19B Distilled

Generate video with audio from audio, text and images using LTX-2 Distilled

LTX-2 19B Distilled

Generate video with audio from videos using LTX-2 Distilled and custom LoRA

LTX-2 19B Distilled

Generate video with audio from videos using LTX-2 Distilled

LTX-2 19B Distilled

Extend videos with audio using LTX-2 Distilled and custom LoRA

LTX-2 19B Distilled

Generate video with audio from text using LTX-2 Distilled

LTX-2 19B Distilled

Generate video with audio from text using LTX-2 Distilled and custom LoRA

LTX-2 19B Distilled

Generate video with audio from images using LTX-2 Distilled

LTX-2 19B Distilled

Extend videos with audio using LTX-2 Distilled

LTX 2.3 Video Fast

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

LTX 2.3 Video Fast

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

LTX 2.3 Video Pro

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

LTX 2.3 Video Pro

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

LTX Video-0.9.5

Generate videos from prompts using LTX Video-0.9.5

LTX Video-0.9.5

Generate videos from prompts,images, and videos using LTX Video-0.9.5

LTX Video-0.9.5

Generate videos from prompts and videos using LTX Video-0.9.5

LTX Video-0.9.7 13B

Generate videos from prompts using LTX Video-0.9.7 13B and custom LoRA

LTX Video-0.9.7 13B

Generate videos from prompts, images, and videos using LTX Video-0.9.7 13B and custom LoRA

LTX Video-0.9.7 13B

Extend videos using LTX Video-0.9.7 13B and custom LoRA

LTX Video-0.9.7 13B

Generate videos from prompts and images using LTX Video-0.9.7 13B and custom LoRA

LTX Video-0.9.7 13B Distilled

Generate videos from prompts, images, and videos using LTX Video-0.9.7 13B Distilled and custom LoRA

LTX Video-0.9.7 13B Distilled

Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA

LTX Video-0.9.7 13B Distilled

Generate videos from prompts using LTX Video-0.9.7 13B Distilled and custom LoRA

LTX Video-0.9.7 13B Distilled

Extend videos using LTX Video-0.9.7 13B Distilled and custom LoRA

LTX Video-0.9.7 LoRA

Generate videos from prompts and images using LTX Video-0.9.7 and custom LoRA

LTX Video-0.9.7 LoRA

Generate videos from prompts, images, and videos using LTX Video-0.9.7 and custom LoRA

LTX-Video 13B 0.9.8 Distilled

Generate long videos from prompts using LTX Video-0.9.8 13B Distilled and custom LoRA

LTX-Video 13B 0.9.8 Distilled

Generate long videos from prompts, images, and videos using LTX Video-0.9.8 13B Distilled and custom LoRA

LTX-Video 13B 0.9.8 Distilled

Generate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA

LTX-Video 13B 0.9.8 Distilled

Extend videos using LTX Video-0.9.8 13B Distilled and custom LoRA

LTX Video 2.0 Fast

Create high-fidelity video with audio from text with LTX-2 Fast

LTX Video 2.0 Fast

Create high-fidelity video with audio from images with LTX-2 Fast

LTX Video 2.0 Pro

Create high-fidelity video with audio from text with LTX-2 Pro.

LTX Video 2.0 Pro

Create high-fidelity video with audio from images with LTX-2 Pro

LTX Video 2.0 Retake

Change sections of a video using LTX-2

LTX Video 2.3 Pro

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

LTX Video 2.3 Pro

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

LTX Video 2.3 Pro

LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.

LTX Video (preview)

Generate videos from prompts using LTX Video

LTX Video (preview)

Generate videos from images using LTX Video

Lucy Edit [Dev]

Edit outfits, objects, faces, or restyle your video - all with maximum detail retention.

Lucy Edit [Fast]

Lucy Edit Fast is a rapid, localized video editing model that lets you modify specific elements like objects, or backgrounds in just 10 seconds.

Lucy Edit [Pro]

Edit outfits, objects, faces, or restyle your video - all with maximum detail retention.

Lucy Image to Video

Lucy delivers lightning fast performance that redefines what's possible with image to video AI

Lucy Restyle

Restyle videos up to 30 min long - maintaining maximum detail quality.

Luma Ray 2

Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion.

Luma Ray 2 Flash

Ray2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion.

Luma Ray 2 Flash (Image to Video)

Ray2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion.

Luma Ray 2 Flash Modify

Ray2 Flash Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.

Luma Ray 2 Flash Reframe

Adjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility.

Luma Ray 2 (Image to Video)

Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion.

Luma Ray 2 Modify

Ray2 Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.

Luma Ray 2 Reframe

Adjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility.

Lynx

Generate subject consistent videos using Lynx from ByteDance!

MAGI-1

MAGI-1 extends videos with an exceptional understanding of physical interactions and prompts

MAGI-1

MAGI-1 is a video generation model with exceptional understanding of physical interactions and cinematic prompts

MAGI-1

MAGI-1 generates videos from images with exceptional understanding of physical interactions and prompting

MAGI-1 (Distilled)

MAGI-1 distilled generates videos faster from images with exceptional understanding of physical interactions and prompting

MAGI-1 (Distilled)

MAGI-1 distilled extends videos faster with an exceptional understanding of physical interactions and prompts

MAGI-1 (Distilled)

MAGI-1 distilled is a faster video generation model with exceptional understanding of physical interactions and cinematic prompts

Marey Realism V1.5

Pull motion from a reference video and apply it to new subjects or scenes.

Marey Realism V1.5

Generate a video from a text prompt with Marey, a generative video model trained exclusively on fully licensed data.

Marey Realism V1.5

Generate a video starting from an image as the first frame with Marey, a generative video model trained exclusively on fully licensed data.

Marey Realism V1.5

Ideal for matching human movement. Your input video determines human poses, gestures, and body movements that will appear in the generated video.

Minimax

Create blazing fast and economical videos with MiniMax Hailuo-02 Image To Video API at 512p resolution

MiniMax Hailuo 02 [Pro] (Image to Video)

MiniMax Hailuo-02 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution

MiniMax Hailuo 02 [Pro] (Text to Video)

MiniMax Hailuo-02 Text To Video API (Pro, 1080p): Advanced video generation model with 1080p resolution

MiniMax Hailuo 02 [Standard] (Image to Video)

MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions

MiniMax Hailuo 02 [Standard] (Text to Video)

MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution

MiniMax Hailuo 2.3 Fast [Pro] (Image to Video)

MiniMax Hailuo-2.3-Fast Image To Video API (Pro, 1080p): Advanced fast image-to-video generation model with 1080p resolution

MiniMax Hailuo 2.3 Fast [Standard] (Image to Video)

MiniMax Hailuo-2.3-Fast Image To Video API (Standard, 768p): Advanced fast image-to-video generation model with 768p resolution

MiniMax Hailuo 2.3 [Pro] (Image to Video)

MiniMax Hailuo-2.3 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution

MiniMax Hailuo 2.3 [Pro] (Text to Video)

MiniMax Hailuo-2.3 Text To Video API (Pro, 1080p): Advanced text-to-video generation model with 1080p resolution

MiniMax Hailuo 2.3 [Standard] (Image to Video)

MiniMax Hailuo-2.3 Image To Video API (Standard, 768p): Advanced image-to-video generation model with 768p resolution

MiniMax Hailuo 2.3 [Standard] (Text to Video)

MiniMax Hailuo-2.3 Text To Video API (Standard, 768p): Advanced text-to-video generation model with 768p resolution

MiniMax (Hailuo AI) Video 01

Generate video clips from your images using MiniMax Video model

MiniMax (Hailuo AI) Video 01

Generate video clips from your prompts using MiniMax model

MiniMax (Hailuo AI) Video 01 Director

Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control.

MiniMax (Hailuo AI) Video 01 Director - Image to Video

Generate video clips more accurately with respect to initial image, natural language descriptions, and using camera movement instructions for shot control.

MiniMax (Hailuo AI) Video 01 Live

Generate video clips from your images using MiniMax Video model

MiniMax (Hailuo AI) Video 01 Live

Generate video clips from your prompts using MiniMax model

MiniMax (Hailuo AI) Video 01 Subject Reference

Generate video clips maintaining consistent, realistic facial features and identity across dynamic video content

Mirelo SFX

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

Mirelo SFX V1.5

Generate synced sounds for any video, and return it with its new sound track (like MMAudio)

MMAudio V2

MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.

Mochi 1

Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.

Multishot Master

MultiShotMaster is a controllable multi-shot narrative video generation framework that supports text-driven inter-shot consistency, variable shot counts and shot durations, customized subject with motion control, and background-driven customized scene.

MuseTalk

MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.

OmniHuman

OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.

One To All Animation

One-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes

One To All Animation

One-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes

Ovi

Ovi can generate videos with audio from image and text inputs.

Ovi Text to Video

A unified paradigm for audio-video generation

Pika

Discover ultimate control with Pikaframes key frame interpolation, a stunning image-to-video feature that allows you to upload up to 5 keyframes, customize their transition length and prompt, and see their images come to life as seamless videos.

Pikadditions (v2)

Pikadditions is a powerful video-to-video AI model that allows you to add anyone or anything to any video with seamless integration.

Pika Effects (v1.5)

Pika Effects are AI-powered video effects designed to modify objects, characters, and environments in a fun, engaging, and visually compelling manner.

Pika Image to Video Turbo (v2)

Turbo is the model to use when you feel the need for speed. Turn your image to stunning video up to 3x faster – all with high quality outputs.

Pika Image to Video (v2.1)

Turn photos into mind-blowing, dynamic videos. Your images can can come to life with sharp details, impressive character control and cinematic camera moves.

Pika Image to Video (v2.2)

Turn photos into mind-blowing, dynamic videos in up to 1080p. Experience better image clarity and crisper, sharper visuals.

Pika Scenes (v2.2)

Pika Scenes v2.2 creates videos from a images with high quality output.

Pika Text to Video Turbo (v2)

Pika v2 Turbo creates videos from a text prompt with high quality output.

Pika Text to Video (v2.1)

Start with a simple text input to create dynamic generations that defy expectations. Anything you dream can come to life with sharp details, impressive character control and cinematic camera moves.

Pika Text to Video (v2.2)

Start with a simple text input to create dynamic generations that defy expectations in up to 1080p. Experience better image clarity and crisper, sharper visuals.

Pixverse

Use the latest pixverse v5.6 model to turn your texts into amazing videos.

Pixverse

Generate high quality video clips with different effects using PixVerse v4.5

Pixverse

Generate high quality video clips from text and image prompts using PixVerse v4.5

Pixverse

Generate high quality video clips from text and image prompts using PixVerse v4.5

Pixverse

Generate high quality and fast video clips from text and image prompts using PixVerse v4.5 fast

Pixverse

Generate fast high quality video clips from text and image prompts using PixVerse v4.5

Pixverse

Pixverse Effects

Pixverse

Add immersive sound effects and background music to your videos using PixVerse sound effects generation

Pixverse

Pixverse Transition

Pixverse

Generate high quality video clips with different effects using PixVerse v5

Pixverse

Generate high quality video clips from text and image prompts using PixVerse v5.5

Pixverse

Generate high quality video clips from text and image prompts using PixVerse v5.5

Pixverse

Generate high quality video clips by swapping person, objects and background using Pixverse Swap.

Pixverse

PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques

Pixverse

Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.

Pixverse

Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.

Pixverse

Create seamless transition between images using PixVerse v4.5

Pixverse

PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques

Pixverse

Create seamless transition between images using PixVerse v5

Pixverse

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model

Pixverse

Generate high quality video clips with different effects using PixVerse v4

Pixverse

Generate high quality video clips from text and image prompts using PixVerse v5

PixVerse v3.5

Generate high quality video clips from text prompts using PixVerse v3.5

PixVerse v3.5: Effects

Generate high quality video clips with different effects using PixVerse v3.5

PixVerse v3.5 Fast

Generate high quality video clips quickly from text prompts using PixVerse v3.5 Fast

PixVerse v3.5: Image to Video

Generate high quality video clips from text and image prompts using PixVerse v3.5

PixVerse v3.5: Image to Video Fast

Generate high quality video clips from text and image prompts quickly using PixVerse v3.5 Fast

PixVerse v3.5: Transition

Create seamless transition between images using PixVerse v3.5

PixVerse v4: Image to Video

Generate high quality video clips from text and image prompts using PixVerse v4

PixVerse v4: Image to Video Fast

Generate fast high quality video clips from text and image prompts using PixVerse v4

PixVerse v4: Text to Video

Generate high quality video clips from text and image prompts using PixVerse v4

PixVerse v4: Text to Video Fast

Generate high quality and fast video clips from text and image prompts using PixVerse v4 fast

Pixverse v5 Image to Video

Generate high quality video clips from text and image prompts using PixVerse v5

RIFE

Interpolate videos with RIFE - Real-Time Intermediate Flow Estimation

Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Sad Talker

Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Sam 3

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

Sam 3

SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.

Sana Video

Leverage Sana's ultra-fast processing speed to generate high-quality assets that transform your text prompts into production-ready videos

Scail

SCAIL is a character animation model that uses 3D consistent pose representations to animate reference images with coherent motion, supporting complex movements.

Seedance 1.0 Lite

Seedance 1.0 Lite

Seedance 1.0 Lite

Seedance 1.0 Lite

Seedance 1.0 Pro

Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.

Seedance 1.0 Pro

Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.

SeedVR2

Upscale your videos using SeedVR2 with temporal consistency!

Segment Anything Model 2

SAM 2 is a model for segmenting images and videos in real-time.

Skyreels V1 (Image-to-Video)

SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning HunyuanVideo on O(10M) high-quality film and television clips

Sora 2

Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

Sora 2

Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

Sora 2

Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

Sora 2

Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.

Sora 2

Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure

Stable Avatar

Stable Avatar generates audio-driven video avatars up to five minutes long

Stable Video Diffusion

Generate short video clips from your prompts using SVD v1.1

Stable Video Diffusion Turbo

Generate short video clips from your images using SVD v1.1 at Lightning Speed

Stable Video Diffusion Turbo

Generate short video clips from your images using SVD v1.1 at Lightning Speed

Steady Dancer

Create smooth, realistic videos from a single photo while keeping the original appearance intact—precise motion control without losing identity or visual quality.

Sync Lipsync

Generate high-quality realistic lipsync animations from audio while preserving unique details like natural teeth and unique facial features using the state-of-the-art Sync Lipsync 2 Pro model.

Sync Lipsync 2.0

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model

Sync React-1

Use React-1 from SyncLabs to refine human emotions and do realistic lip-sync without losing details!

sync.so -- lipsync 1.9.0-beta

Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.

T2V Turbo - Video Crafter

Generate short video clips from your prompts

ThinkSound

Generate realistic audio from a video with an optional text prompt

ThinkSound

Generate realistic audio for a video with an optional text prompt and combine

Topaz Video Upscale

Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.

TransPixar V1

Transform text into stunning videos with TransPixar - an AI model that generates both RGB footage and alpha channels, enabling seamless compositing and creative video effects.

V2.6

Wan 2.6 reference-to-video flash model.

V2.6

Wan 2.6 image-to-video flash model.

Vace

Vace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Veo 2

Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.

Veo 2 (Image to Video)

Veo 2 creates videos from images with realistic motion and very high quality output.

Veo 3

Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!

Veo3

Veo 3 is the latest state-of-the art video generation model from Google DeepMind

Veo 3.1

Extend Veo-Created Videos up to 30 seconds

Veo 3.1

Generate Videos from images using Google's Veo 3.1

Veo 3.1

Generate videos from a first and last framed using Google's Veo 3.1

Veo 3.1

Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!

Veo 3.1

Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind

Veo 3.1 Fast

Faster and more cost effective version of Google's Veo 3.1!

Veo 3.1 Fast

Generate videos from a first/last frame using Google's Veo 3.1 Fast

Veo 3.1 Fast

Extend Veo-Created Videos up to 30 seconds

Veo 3.1 Fast

Generate videos from your image prompts using Veo 3.1 fast.

Veo 3 Fast

Faster and more cost effective version of Google's Veo 3!

Veo 3 Fast [Image to Video]

Now with a 50% price drop. Generate videos from your image prompts using Veo 3 fast.

Video

Upscale videos up to 8K output resolution. Trained on fully licensed and commercially safe data.

Video

Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.

Video

A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.

Video

A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency

Video

A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.

Video As Prompt

A model for unified semantic control in video generation. It animates a static reference image using the motion and semantics from a reference video as a prompt.

Video Background Removal

Remove background from any video with people and objects. No green screen needed.

Video Background Removal

Remove background from videos filmed using chromakey, with automatic green spill suppression for clean, professional edges.

Video Background Removal

Remove background from any video with people and objects. No green screen needed.

Video Sound Effects Generator

Add sound effects to your videos

Video Upscaler

The video upscaler endpoint uses RealESRGAN on each frame of the input video to upscale the video to a higher resolution.

Vidu

Use the latest Vidu Q2 Pro models which much more better quality and control on your videos.

Vidu

Vidu's latest Q3 pro models

Vidu

Vidu's latest Q3 pro models.

Vidu

Generate video clips from your multiple image references using Vidu Q1

Vidu

Use the latest Vidu Q2 models which much more better quality and control on your videos.

Vidu

Vidu's Q3 Turbo Model

Vidu

Use the latest Vidu Q2 models which much more better quality and control on your videos.

Vidu

Use the latest Vidu Q2 models which much more better quality and control on your videos.

Vidu

Use the latest Vidu Q2 models which much more better quality and control on your videos.

Vidu

Vidu's Q3 Turbo Model.

Vidu Image to Video

Vidu Q1 Image to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity from a single image

Vidu Image to Video

Vidu Image to Video generates high-quality videos with exceptional visual quality and motion diversity from a single image

Vidu Reference to Video

Vidu Reference to Video creates videos by using a reference images and combining them with a prompt.

Vidu Start End to Video

Vidu Q1 Start-End to Video generates smooth transition 1080p videos between specified start and end images.

Vidu Start-End to Video

Vidu Start-End to Video generates smooth transition videos between specified start and end images.

Vidu Template to Video

Vidu Template to Video lets you create different effects by applying motion templates to your images.

Vidu Text to Video

Vidu Q1 Text to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity

Wan

Wan 2.2's 5B distill model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

Wan

Wan-2.2 turbo text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.

Wan

Wan-2.2 video-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and source videos.

Wan

Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

Wan

Wan-2.2 Turbo image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.

Wan-2.1 First-Last-Frame-to-Video

Wan-2.1 flf2v generates dynamic videos by intelligently bridging a given first frame to a desired end frame through smooth, coherent motion sequences.

Wan-2.1 Image-to-Video

Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images

Wan-2.1 Image-to-Video with LoRAs

Add custom LoRAs to Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images

Wan-2.1 Pro Image-to-Video

Wan-2.1 Pro is a premium image-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from images

Wan-2.1 Pro Text-to-Video

Wan-2.1 Pro is a premium text-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from text prompts

Wan-2.1 Text-to-Video

Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts

Wan-2.1 Text-to-Video with LoRAs

Add custom LoRAs to Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from images

Wan 2.1 VACE Long Reframe

Reframe entire videos scene-by-scene using Wan VACE 2.1

Wan-2.2 Animate Move

Wan-Animate is a video model that generates high-fidelity character videos by replicating the expressions and movements of characters from reference videos.

Wan-2.2 Animate Replace

Wan-Animate Replace is a model that can integrate animated characters into reference videos, replacing the original character while preserving the scene’s lighting and color tone for seamless environmental integration.

Wan 2.2 Fun Control

Generate pose or depth controlled video using Alibaba-PAI's Wan 2.2 Fun

Wan-2.2 Speech-to-Video 14B

Wan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications

Wan-2.2 Text-to-Video A14B

Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.

Wan-2.2 Text-to-Video A14B with LoRAs

Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. This endpoint supports LoRAs made for Wan 2.2.

Wan 2.2 VACE Fun A14B

VACE Fun for Wan 2.2 A14B from Alibaba-PAI

Wan 2.2 VACE Fun A14B

VACE Fun for Wan 2.2 A14B from Alibaba-PAI

Wan 2.2 VACE Fun A14B

VACE Fun for Wan 2.2 A14B from Alibaba-PAI

Wan 2.2 VACE Fun A14B

VACE Fun for Wan 2.2 A14B from Alibaba-PAI

Wan 2.2 VACE Fun A14B

VACE Fun for Wan 2.2 A14B from Alibaba-PAI

Wan 2.5 Image to Video

Wan 2.5 image-to-video model.

Wan 2.5 Text to Video

Wan 2.5 text-to-video model.

Wan Alpha

Generate videos with transparent backgrounds

Wan Ati

WAN-ATI is a controllable video generation model that uses trajectory instructions to guide object, local, and camera motion, enabling precise and flexible image-to-video creation.

Wan Effects

Wan Effects generates high-quality videos with popular effects from images

Wan Motion

Wan Motion is a streamlined character animation model that transfers motion from a driving video onto a reference character image. Based on Wan-Animate which preserves the original character's proportions, Simple uses pose retargeting to adapt the driving video's skeleton to match the reference character's body shape, producing more natural results when the two have different builds. It outputs at 720p with optimized defaults for fast, high-quality generation — just provide a video, an image, and an optional prompt.

Wan Move [480p]

Use Wan-Move to generate videos with controlled the motion using trajectories

Wan v2.2 5B

Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

Wan v2.2 5B

Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding

Wan v2.2 A14B

fal-ai/wan/v2.2-A14B/image-to-video

Wan v2.2 A14B Image-to-Video A14B with LoRAs

Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2

Wan v2.6 Image to Video

Wan 2.6 image-to-video model.

Wan v2.6 Reference to Video

Wan 2.6 reference-to-video model.

Wan v2.6 Text to Video

Wan 2.6 text-to-video model.

Wan Vace 1 3b

Vace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Wan VACE 14B

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Wan VACE 14B

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Wan VACE 14B

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Wan VACE 14B

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Wan VACE 14B

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Wan VACE 14B

VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.

Wan VACE Video Edit

Edit videos using plain language and Wan VACE

Wan Vision Enhancer

Wan Vision Enhancer for magnify/enhance video with high fidelity and creativity.

Workflow Utilities

FFMPEG Utility for Trim Video

Workflow Utilities

Add automatic subtitles to videos

Workflow Utilities

FFMPEG Utility to Reverse Videos

Workflow Utilities

FFMPEG Utilities to Scale Videos

Workflow Utilities

FFMPEG Utility for Blending Videos

SEO текст

Добро пожаловать на ИИ-комбайн!

Тут мы не только пашем нейросеть, но и молотим тексты, жмём инсайты и вяжем интеграции в снопы API.

Наш многофункциональный ИИ-комбайн заменяет:

— копирайтера,
— маркетолога,
— фронтендера,
— психотерапевта (ну почти).

Вы можете:

— писать SEO-тексты, не зевая,
— генерировать код, не матерясь,
— создавать картинки, не рисуя,
— управлять ассистентами, не нанимая.

Один комбайн — тысячи задач.

Никакой словаря, только чистый синтез синтаксиса и смысла.