Каталог нейросетей для генерации текста, изображений и видео
Mock
GPT-4.1OpenAI
GPT-4.1-MiniOpenAI
GPT-4.1-NanoOpenAI
DeepSeek ChatDeepSeek
DeepSeek CoderDeepSeek
GPT-5OpenAI
GPT-5-MiniOpenAI
GPT-5-NanoOpenAI
Claude Opus 4.5Claude
Claude Sonnet 4.5Claude
Claude Haiku 4.5Claude
Grok 4 FastGrok
Grok 4.1 Fast (Reasoning)Grok
Grok 4 Fast (Reasoning)Grok
Grok 4Grok
Grok 3 MiniGrok
Grok 3Grok
Gemini 3 Pro PreviewGemini
Gemini 2.5 ProGemini
Gemini 2.5 FlashGemini
Gemini 2.5 Flash-LiteGemini
Ai DetectorAI Detector (Text) is an advanced AI service that analyzes a passage and returns a verdict on whether it was likely written by AI.
Claude Haiku 4.5 Claude Opus 4.5 Claude Sonnet 4.5 DeepSeek Chat DeepSeek Coder ElevenLabs Speech to TextGenerate text from speech using ElevenLabs advanced speech-to-text model.
ElevenLabs Speech to Text - Scribe V2Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!
FiboStructured Prompt Generation endpoint for Fibo, Bria's SOTA Open source model
Fibo Edit [Structured Instruction]Structured Instructions Generation endpoint for Fibo Edit, Bria's newest editing model.
Fibo LiteStructured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model
Fibo LiteStructured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model
Gemini 2.5 Flash Gemini 2.5 Flash Lite Gemini 2.5 Pro Gemini 3 Pro Preview GPT-4.1 GPT-4.1 mini GPT-4.1 nano GPT-5 GPT-5 mini GPT-5 nano Grok 3 Grok 3 mini Grok 4 Grok 4.1 Fast Grok 4.1 Fast Reasoning Grok 4 Fast Reasoning Mock ChatLocal mock model for chat testing
NemotronUse the fast speed and pin point accuracy of nemotron to transcribe your texts.
NemotronUse the fast speed and pin point accuracy of nemotron to transcribe your texts.
OpenRouter [Video]Run any VLM (Video Language Model) with fal, powered by OpenRouter.
OpenRouter [Video][Enterprise]Run any VLM (Video Language Model) with fal, powered by OpenRouter.
Pipecat's Smart Turn modelAn open source, community-driven and native audio turn detection model by Pipecat AI.
Silero VADDetect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model
Speech-to-TextLeverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
Speech-to-TextLeverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
Speech-to-TextLeverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
Speech-To-textLeverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
WhisperWhisper is a model for speech transcription and translation.
Wizper (Whisper v3 -- fal.ai edition)[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!
Image to 3D endpoint for Bytedance's high-quality Seed3D 3d model generator.
Hunyuan 3dCreate detailed, fully-textured 3D models with text
Hunyuan3DGenerate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Hunyuan3DGenerate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Hunyuan3DGenerate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Hunyuan3DGenerate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Hunyuan3DGenerate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Hunyuan3DGenerate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Hunyuan 3D 2.1Hunyuan3D-2.1 is a scalable 3D asset creation system that advances state-of-the-art 3D generation through Physically-Based Rendering (PBR).
Hunyuan 3D Part SplitterSplit 3D models into parts with Hunyuan 3D
Hunyuan 3D Pro Image to 3DGenerate 3D models from images with Hunyuan 3D Pro
Hunyuan 3D Pro Text to 3DGenerate 3D models from text prompts with Hunyuan 3D Pro
Hunyuan 3D Rapid Image to 3DRapidly generate 3D models from images using Hunyuan 3D.
Hunyuan 3D Smart TopologyOptimize 3D mesh topology with Hunyuan 3D Smart Topology.
Hunyuan3d V3Create your imagined 3D models with just text. Production-ready, export-ready professional assets with realistic lighting and materials in minutes.
Hunyuan3d V3Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing.
Hunyuan3d V3Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender.
Hunyuan Motion [0.46B]Generate 3D human motions via text-to-generation interface of Hunyuan Motion!
Hunyuan Motion [1B]Generate 3D human motions via text-to-generation interface of Hunyuan Motion!
Hunyuan PartUse the capabilities of hunyuan part to generate point clouds from your 3D files.
Hunyuan WorldHunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.
Hyper3dRodin by Hyper3D generates realistic and production ready 3D models from text or images.
Hyper3D RodinRodin by Hyper3D generates realistic and production ready 3D models from text or images.
Meshy 5 MultiMeshy-5 multi image generates realistic and production ready 3D models from multiple images.
Meshy 5 RemeshMeshy-5 remesh allows you to remesh and export existing 3D models into various formats
Meshy 5 RetextureMeshy-5 retexture applies new, high-quality textures to existing 3D models using either text prompts or reference images. It supports PBR material generation for realistic, production-ready results.
Meshy 6Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.
Meshy 6Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.
Meshy 6 PreviewMeshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.
Meshy 6 PreviewMeshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.
OmnipartImage-to-3D endpoint for OmniPart, a part-aware 3D generator with semantic decoupling and structural cohesion.
PshumanUse the 6D pose estimation capabilities of PSHuman to generate 3D files from single image.
Sam 3SAM 3D enables full scene reconstructions, placing objects and humans in a shared context together.
Sam 3SAM 3D enables precise 3D reconstruction of objects from real images, while accurately reconstructing their geometry and texture.
Sam 3SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image.
TrellisGenerate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.
TrellisGenerate 3D models from your images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Trellis 2Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Trellis 2Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation.
Tripo3DState of the art Multiview to 3D Object generation. Generate 3D models from multiple images!
Tripo3DState of the art Image to 3D Object generation. Generate 3D model from a single image!
TripoSRState of the art Image to 3D Object generation
UltrashapeUltraShape-1.0 is a 3D diffusion framework that generates high-fidelity 3D geometry through coarse-to-fine geometric refinement.
Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step
ACE-StepModify a portion of provided audio with lyrics and/or style using ACE-Step
ACE-StepGenerate music with lyrics from text using ACE-Step
ACE-StepGenerate music from a simple prompt using ACE-Step
ACE-StepGenerate music from a lyrics and example audio using ACE-Step
Audio UnderstandingA audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.
ChatterboxWhether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
ChatterboxWhether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
ChatterboxWhether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
ChatterboxhdGenerate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.
ChatterboxhdTransform voices using Resemble AI's Chatterbox. Convert audio to new voices or your own samples, with expressive results and built-in perceptual watermarking.
CSM-1BCSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.
DeepFilterNet 3Enhance speech audio by removing background noise and upsampling to 48KHz
DemucsSOTA stemming model for voice, drums, bass, guitar and more.
DiaDia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.
Dia TtsClone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.
DiffRhythm: Lyrics to SongDiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.
ElevenlabsGenerate sound effects using ElevenLabs advanced sound effects model.
ElevenlabsGenerate realistic audio dialogues using Eleven-v3 from ElevenLabs.
ElevenlabsGenerate text-to-speech audio using Eleven-v3 from ElevenLabs.
ElevenLabs Audio IsolationIsolate audio tracks using ElevenLabs advanced audio isolation technology.
Elevenlabs MusicGenerate high quality, realistic music with fine controls using Elevenlabs Music!
ElevenLabs TTS Multilingual v2Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.
ElevenLabs TTS Turbo v2.5Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.
ElevenLabs Voice ChangerChange the voices in your audios with voices in ElevenLabs!
F5 TTSF5 TTS
FFmpeg API [Merge Audios]Merge audios into a single audio using FFmpeg API!
Index TTS 2.0Generate natural, clear speeches using Index TTS 2.0 from IndexTeam
Kling TTSGenerate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.
Kling VideoGenerate audio from input videos using Kling
Kling Video Create VoiceCreate Voices to be used with Kling Models Voice Control
Kokoro TTSKokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient.
Kokoro TTS (Brazilian Portuguese)A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.
Kokoro TTS (British English)A high-quality British English text-to-speech model offering natural and expressive voice synthesis.
Kokoro TTS (French)An expressive and natural French text-to-speech model for both European and Canadian French.
Kokoro TTS (Hindi)A fast and expressive Hindi text-to-speech model with clear pronunciation and accurate intonation.
Kokoro TTS (Italian)A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.
Kokoro TTS (Japanese)A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.
Kokoro TTS (Mandarin Chinese)A highly efficient Mandarin Chinese text-to-speech model that captures natural tones and prosody.
Kokoro TTS (Spanish)A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.
Lava SREnhance muffled 16 kHz speech audio into crystal-clear 48 kHz, with denoising for particularly bad inputs.
Lyria2Lyria 2 is Google's latest music generation model, you can generate any type of music with this model.
MayaMaya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.
MayaMaya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.
Maya1Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.
MinimaxGenerate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.
MinimaxGenerate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax (Hailuo AI) MusicGenerate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
MiniMax (Hailuo AI) Music v1.5Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
Minimax MusicGenerate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
MiniMax Speech-02 HDGenerate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax Speech-02 TurboGenerate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax Speech 2.6 [HD]Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax Speech 2.6 [Turbo]Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax Speech 2.8 [HD]Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax Speech 2.8 [Turbo]Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax Voice CloningClone a voice from a sample audio and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.
MiniMax Voice DesignDesign a personalized voice from a text description, and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.
Mirelo SFXGenerate synced sounds for any video, and return the new sound track (like MMAudio)
Mirelo SFX V1.5Generate synced sounds for any video, and return the new sound track (like MMAudio)
MMAudio V2 Text to AudioMMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.
Music GenerationGenerate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.
music generatorCassetteAI’s model generates a 30-second sample in under 2 seconds and a full 3-minute track in under 10 seconds. At 44.1 kHz stereo audio, expect a level of professional consistency with no breaks, no squeaks, and no random interruptions in your creations.
Nova SREnhance muffled 16 kHz speech audio into crystal-clear 48 kHz
Orpheus TTSOrpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time performances.
PersonaplexPersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning.
Qwen 3 TTS - Clone Voice [0.6B]Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!
Qwen 3 TTS - Clone Voice [1.7B]Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!
Qwen 3 TTS - Text to Speech [0.6B]Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model
Qwen 3 TTS - Text to Speech [1.7B]Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model
Qwen 3 TTS - Voice Design [1.7B]Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!
Sam AudioAudio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
Sam AudioAudio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
Sam AudioAudio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
Sonauto V2Extend an existing song
Sonauto V2Replace sections of an existing audio with newly generated content
Sonauto V2Create full songs in any style
Sound Effect GenerationCreate professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.
Sound Effects GeneratorCreate stunningly realistic sound effects in seconds - CassetteAI's Sound Effects Model generates high-quality SFX up to 30 seconds long in just 1 second of processing time
Stable Audio 2.5Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI
Stable Audio 2.5Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI
Stable Audio 25Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI
Stable Audio OpenOpen source text-to-audio model.
VibevoiceGenerate long speech snippets fast using Microsoft's powerful TTS.
VibeVoice 1.5BGenerate long, expressive multi-voice speech using Microsoft's powerful TTS
VibeVoice 7BGenerate long, expressive multi-voice speech using Microsoft's powerful TTS
Workflow UtilitiesFFMPEG Utility for Impulse Response
Workflow UtilitiesFFMPEG Utility for Audio Compression
YuE: Lyrics to SongYuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs.
Zonos-Audio-CloneClone voice of any person and speak anything in their voice using zonos' voice cloning.
Modify a face to look younger or older while keeping identity realistic.
Ai Baby And Aging GeneratorAI Baby Generator is a service that instantly creates realistic predictions of a future child from parent photos.
Ai Baby And Aging GeneratorAI Aging Generator performs controllable age progression or regression from a single face photo, generating lifelike portraits across eight age groups from baby to senior.
Ai Face SwapAI-FaceSwap-Image is a service that can take one person's face and realistically blend it onto another's in a photo.
Ai HomeAI Home Edit transforms your home interior and exterior photos with realistic, prompt-based edits
Ai HomeAI Home Style reimagines your home interior and exterior design with bold, prompt-driven concepts
AuraFlowAuraFlow v0.3 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta.
AuraSRUpscale your images with AuraSR.
BagelBagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both images and text.
BagelBagel is a 7B parameter from Bytedance-Seed multimodal model that can generate both text and images.
ben-v2-imageA fast and high quality model for image background removal.
Birefnet Background Removalbilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
Birefnet Background Removalbilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
BitdanceImage generation with BitDance. Fast, high-resolution photorealistic images using an autoregressive LLM— for efficient, high-quality results.
BriaStructure Reference allows generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data for safe and risk-free commercial use.
Bria 3.2 Text-to-ImageBria’s Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Excels in Text-Rendering and Aesthetics.
Bria Background ReplaceBria Background Replace allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use
Bria EraserBria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
Bria Expand ImageBria Expand expands images beyond their borders in high quality. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
Bria GenFillBria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
Bria Product ShotPlace any product in any scenery with just a prompt or reference image while maintaining high integrity of the product. Trained exclusively on licensed data for safe and risk-free commercial use and optimized for eCommerce.
Bria RMBG 2.0Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04
Bria Text-to-Image BaseBria's Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
Bria Text-to-Image FastBria's Text-to-Image model with perfect harmony of latency and quality. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
Bria Text-to-Image HDBria's Text-to-Image model for HD images. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
BytedanceImage editing endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent image editing with multiple inputs.
BytedanceSeedream 3.0 is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation.
BytedanceA new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.
BytedanceA new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.
BytedanceDreamina showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details.
BytedanceText to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation.
Bytedance Seedream v4A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.
Bytedance Seedream v4 EditA new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.
CalligrapherUse the text and font retaining capabilities of calligrapher to modify texts on your books, clothes and many more.
CartoonifyTransform images into 3D cartoon artwork using an AI model that applies cartoon stylization while preserving the original image's composition and details.
CCSR UpscalerSOTA Image Upscaler
Chain Of ZoomExtreme Super-Resolution via Scale Autoregression and Preference Alignment
Chrono EditNVIDIA's Logically Consistent and Physics-Aware Image Editing Model
Chrono Edit LoraLoRA endpoint for the Chrono Edit model.
Chrono Edit Lora GalleryUpscales and cleans up the image.
Chrono Edit Lora GalleryYou can make edits simply by drawing a quick sketch on the input image.
City TeleportPlace a person’s photo into iconic cities worldwide.
Clarity UpscalerClarity upscaler for upscaling images with high very fidelity.
CodeFormerFix distorted or blurred photos of people with CodeFormer.
CogViewGenerate high quality images from text prompts using CogView4. Longer text prompts will result in better quality images.
ControlNet SDXLGenerate Images with ControlNet.
ControlNet SDXLGenerate Images with ControlNet.
ControlNet SDXLGenerate Images with ControlNet.
Creative UpscalerCreate creative upscaled images.
Crystal UpscalerAn advanced image enhancement tool designed specifically for facial details and portrait photography, utilizing Clarity AI's upscaling technology.
DDColorBring colors into old or new black and white photos with DDColor.
DeepSeek Janus-ProDeepSeek Janus-Pro is a novel text-to-image model that unifies multimodal understanding and generation through an autoregressive framework
DiffusionEdgeDiffusion based high quality edge detection
DocResEnhance low-resolution, blur, shadowed documents with the superior quality of docres for sharper, clearer results.
DocRes-dewarpEnhance wraped, folded documents with the superior quality of docres for sharper, clearer results.
DRCT-Super-ResolutionUpscale your images with DRCT-Super-Resolution.
DreamODreamO is an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions.
DreamOmni2DreamOmni2 is a unified multimodal model for text and image guided image editing.
DreamshaperDreamshaper model.
DWPose Pose PredictionPredict poses from images.
Embed ProductSeamlessly integrate one or more products into a predefined scene with pixel-perfect control.
Emu 3.5 ImageEdit images with a text prompt using Emu 3.5 Image
Emu 3.5 ImageGenerate images from text using Emu 3.5 Image
Era 3DA powerful image to novel multiview model with normals.
EVF-SAM2 SegmentationEVF-SAM2 combines natural language understanding with advanced segmentation capabilities, allowing you to precisely mask image regions using intuitive positive and negative text prompts.
Expression ChangeChange facial expressions in photos with realistic results.
Face RetoucherAutomatically retouches faces to smooth skin and remove blemishes.
Face to StickerCreate stickers from faces.
FASHN Virtual Try-On V1.5FASHN v1.5 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 576x864 resolution from both on-model and flat-lay photo references.
FASHN Virtual Try-On V1.6FASHN v1.6 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 864x1296 resolution from both on-model and flat-lay photo references.
Ffmpeg Apiffmpeg endpoint for first, middle and last frame extraction from videos
FiboSOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.
Fibo Bbq PreviewA preview to the next level of control of Text-to-Image models.
Fibo EditA high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.
Fibo Edit [Add Object by Text]Precise, context-aware insertion of new objects into an existing image using simple, structured spatial commands.
Fibo Edit [Blend]Complex, multi-step visual composition through natural language.
Fibo Edit [Colorize]Transforms the color treatment of images using predefined, style-based commands
Fibo Edit [Erase by Text]Fast, reliable removal of unwanted elements from images. Designed for predictability, scale, and production use.
Fibo Edit [Relight]Precise, controllable lighting changes using simple, structured text inputs.
Fibo Edit [Replace Object by Text]Natural, expressive object swapping within images using plain language
Fibo Edit [Reseason]Transforms the seasonal or weather atmosphere of an image.
Fibo Edit [Restore]Automatically renews and cleans noisy or degraded images.
Fibo Edit [Restyle]Transforms images into distinct artistic styles using curated, production-grade style mappings
Fibo Edit [Rewrite Text]Precise, reliable modification of existing text inside images.
Fibo Edit [Sketch to Image]Converts line drawings and sketches into photorealistic, fully colored images
Fibo LiteFibo Lite, the new addition to the Fibo model family, allows generating high-quality images with the same controllability of the JSON structured prompt with significantly improved latency.
FILMInterpolate images with FILM - Frame Interpolation for Large Motion
finegrain eraserFinegrain Eraser removes objects—along with their shadows, reflections, and lighting artifacts—using only natural language, seamlessly filling the scene with contextually accurate content.
finegrain eraserFinegrain Eraser removes any object selected with a bounding box—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content.
finegrain eraserFinegrain Eraser removes any object selected with a mask—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content.
Firered Image EditFireRed Image Edit is FireRed's state of the art open source editing model, re-trained from Qwen Image Edit 2509.
Firered Image Edit V1.1FireRed Image Edit v1.1 is an updated version of FireRed Image Edit, with improved image editing capabilities.
F LiteF Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content.
F Lite (texture mode)F Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content. This is a high texture density variant of the model.
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Flow-EditThe model provides you high quality image editing capabilities.
FLUX1.1 [pro]FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.
FLUX1.1 [pro] ReduxFLUX1.1 [pro] Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX1.1 [pro] ultraFLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
FLUX1.1 [pro] ultra Fine-tunedFLUX1.1 [pro] ultra fine-tuned is the newest version of FLUX1.1 [pro] with a fine-tuned LoRA, maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
FLUX1.1 [pro] ultra ReduxFLUX1.1 [pro] ultra Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [dev]FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
FLUX.1 [dev]FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
FLUX.1 [dev]FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.
FLUX.1 [dev]FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [dev] Canny with LoRAsUtilize Flux.1 [dev] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
FLUX.1 [dev] Control LoRA CannyFLUX Control LoRA Canny is a high-performance endpoint that uses a control image using a Canny edge map to transfer structure to the generated image and another initial image to guide color.
FLUX.1 [dev] Control LoRA CannyFLUX Control LoRA Canny is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a Canny edge map.
FLUX.1 [dev] Control LoRA DepthFLUX Control LoRA Depth is a high-performance endpoint that uses a control image using a depth map to transfer structure to the generated image and another initial image to guide color.
FLUX.1 [dev] Control LoRA DepthFLUX Control LoRA Depth is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a depth map.
FLUX.1 [dev] Depth with LoRAsGenerate high-quality images from depth maps using Flux.1 [dev] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization.
FLUX.1 [dev] Differential DiffusionFLUX.1 Differential Diffusion is a rapid endpoint that enables swift, granular control over image transformations through change maps, delivering fast and precise region-specific modifications while maintaining FLUX.1 [dev]'s high-quality output.
FLUX.1 [dev] Fill with LoRAsFLUX.1 [dev] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [dev] Inpainting with LoRAsSuper fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 [dev] ReduxFLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [dev] ReduxFLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [dev] with Controlnets and LorasA specialized FLUX endpoint combining differential diffusion control with LoRA, ControlNet, and IP-Adapter support, enabling precise, region-specific image transformations through customizable change maps.
FLUX.1 [dev] with Controlnets and LorasFLUX General Image-to-Image is a versatile endpoint that transforms existing images with support for LoRA, ControlNet, and IP-Adapter extensions, enabling precise control over style transfer, modifications, and artistic variations through multiple guidance methods.
FLUX.1 [dev] with Controlnets and LorasA general purpose endpoint for the FLUX.1 [dev] model, implementing the RF-Inversion pipeline. This can be used to edit a reference image based on a prompt.
FLUX.1 [dev] with Controlnets and LorasA versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.
FLUX.1 [dev] with Controlnets and LorasFLUX General Inpainting is a versatile endpoint that enables precise image editing and completion, supporting multiple AI extensions including LoRA, ControlNet, and IP-Adapter for enhanced control over inpainting results and sophisticated image modifications.
FLUX.1 [dev] with LoRAsSuper fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 [dev] with LoRAsFLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.
FLUX.1 Kontext [dev]Frontier image editing model.
FLUX.1 Kontext [max]Experimental version of FLUX.1 Kontext [max] with multi image handling capabilities
FLUX.1 Kontext [max]FLUX.1 Kontext [max] text-to-image is a new premium model brings maximum performance across all aspects – greatly improved prompt adherence.
FLUX.1 Kontext [max]FLUX.1 Kontext [max] is a model with greatly improved prompt adherence and typography generation meet premium consistency for editing without compromise on speed.
FLUX.1 Kontext [pro]FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.
FLUX.1 Kontext [pro]The FLUX.1 Kontext [pro] text-to-image delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, and flawless typography.
FLUX.1 Kontext [pro]Experimental version of FLUX.1 Kontext [pro] with multi image handling capabilities
FLUX.1 Krea [dev]FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 Krea [dev]FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 Krea [dev]FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 Krea [dev]FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 Krea [dev] Inpainting with LoRAsSuper fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 Krea [dev] ReduxFLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 Krea [dev] ReduxFLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 Krea [dev] with LoRAsSuper fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
FLUX.1 Krea [dev] with LoRAsFLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.
FLUX.1 [pro] FillFLUX.1 [pro] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [pro] Fill Fine-tunedFLUX.1 [pro] Fill Fine-tuned is a high-performance endpoint for the FLUX.1 [pro] model with a fine-tuned LoRA that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [schnell]Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.
FLUX.1 [schnell]FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.
FLUX.1 [schnell] ReduxFLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 [schnell] ReduxFLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
FLUX.1 SRPO [dev]FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 SRPO [dev]FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 SRPO [dev]FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 SRPO [dev]FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.
FLUX.1 SubjectSuper fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.
Flux 2Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.
Flux 2Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.
Flux 2Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—in a flash.
Flux 2Image-to-image editing with LoRA support for FLUX.2 [dev] from Black Forest Labs. Specialized style transfer and domain-specific modifications.
Flux 2Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.
Flux 2Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—all at turbo speed.
Flux 2Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.
Flux 2Text-to-image generation with LoRA support for FLUX.2 [dev] from Black Forest Labs. Custom style adaptation and fine-tuned model variations.
Flux 2 FlexImage editing with FLUX.2 [flex] from Black Forest Labs. Supports multi-reference editing with customizable inference steps and enhanced text rendering.
Flux 2 FlexText-to-image generation with FLUX.2 [flex] from Black Forest Labs. Features adjustable inference steps and guidance scale for fine-tuned control. Enhanced typography and text rendering capabilities.
Flux 2 [klein] 4BText-to-image generation with Flux 2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.
Flux 2 [klein] 4BImage-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.
Flux 2 [klein] 4B BaseText-to-image generation with Flux 2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.
Flux 2 [klein] 4B BaseImage-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.
Flux 2 [klein] 4B Base LoraText-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.
Flux 2 [klein] 4B Base LoraImage-to-image editing with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.
Flux 2 [klein] 9BImage-to-image editing with Flux 2 [klein] 9B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.
FLUX.2 [klein] 9BText-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.
Flux 2 [klein] 9B BaseImage-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.
FLUX.2 [klein] 9B BaseText-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.
Flux 2 [klein] 9B Base LoraImage-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.
Flux 2 [klein] 9B Base LoraText-to-image generation with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.
Flux 2 [klein] RealtimeRealtime generation with FLUX.2 [klein] from Black Forest Labs.
Flux 2 Lora GalleryVirtually furnishes an empty apartment
Flux 2 Lora GalleryApplies sepia vintage effect to images
Flux 2 Lora GalleryVirtual clothing try-on (2 images: person + garment)
Flux 2 Lora GalleryGenerates satellite/aerial view style images
Flux 2 Lora GalleryMakes images more photorealistic and natural
Flux 2 Lora GalleryGenerates same object from different angles (azimuth/elevation)
Flux 2 Lora GalleryHDR surrealistic effect with intense colors
Flux 2 Lora GalleryExtends a face into a full body portrait
Flux 2 Lora GalleryTransforms images into comic book style
Flux 2 Lora GalleryBallpoint pen sketch drawing style
Flux 2 Lora GalleryAdd a background to images with white/clean background
Flux 2 MaxFLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.
Flux 2 MaxFLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.
Flux 2 ProText-to-image generation with FLUX.2 [pro] from Black Forest Labs. Optimized for maximum quality, exceptional photorealism and artistic images.
Flux 2 ProImage editing with FLUX.2 [pro] from Black Forest Labs. Ideal for high-quality image manipulation, style transfer, and sequential editing workflows
Flux Kontext LoraFast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.
Flux Kontext LoraSuper fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
Flux Kontext LoraFast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.
Flux Krea LoraSuper fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
Flux LoraSuper fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
Flux Vision UpscalerFlux Vision Upscaler for magnify/upscaling images with high fidelity and creativity.
FooocusDefault parameters with automated optimizations and quality improvements.
FooocusFooocus extreme speed mode as a standalone app.
FooocusFooocus extreme speed mode as a standalone app.
Fooocus Image PromptDefault parameters with automated optimizations and quality improvements.
Fooocus InpaintingDefault parameters with automated optimizations and quality improvements.
Fooocus Upscale or VaryDefault parameters with automated optimizations and quality improvements.
Gemini 2.5 Flash ImageGoogle's famous original image generation and editing model, a.k.a Nano Banana
Gemini 2.5 Flash ImageGoogle's famous original image generation and editing model, a.k.a Nano Banana
Gemini 3.1 Flash Image PreviewGemini 3.1 Flash Image (a.k.a Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model
Gemini 3.1 Flash Image PreviewGemini 3.1 Flash Image (a.k.a. Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model
Gemini 3 Pro Image PreviewGemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model
Gemini 3 Pro Image PreviewGemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model
Gemini Flash Edit Multi ImageGemini Flash Edit Multi Image is a model that can edit multiple images using a text prompt and a reference image.
Gemini Flash Edit Multi ImageGemini Flash Edit is a model that can edit single image using a text prompt and a reference image.
GenfocusGenFocus Model to Refocus Images
GenfocusGenFocus Model to Refocus Images
Ghiblify ImagesReimagine and transform your ordinary photos into enchanting Studio Ghibli style artwork
Glm ImageCreate high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.
Glm ImageCreate high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.
gpt-image-1OpenAI's latest image generation and editing model: gpt-1-image.
gpt-image-1OpenAI's latest image generation and editing model: gpt-1-image.
GPT-Image 1.5GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.
GPT-Image 1.5GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.
GPT Image 1 MiniGPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation.
GPT Image 1 MiniGPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation.
Grok Imagine ImageGenerate highly aesthetic images with xAI's Grok Imagine Image generation model.
Grok Imagine ImageEdit images precisely with xAI's Grok Imagine model
Hair ChangeChange hairstyles and hair colors in photos realistically.
Headshot GeneratorGenerate professional headshot photos with customizable backgrounds.
Hidream E1 1Edit images with natural language
Hidream I1 DevHiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
Hidream I1 FastHiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.
Hidream I1 FullHiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
Hidream I1 FullHiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
Hunyuan ImageUse the amazing capabilities of hunyuan image 2.1 to generate images that express the feelings of your text.
Hunyuan ImageLeverage the state-of-the-art capabilities of Hunyuan Image 3.0 to generate visual content that effectively conveys the messaging of your written material.
Hunyuan ImageImage editing endpoint for Hunyuan Image 3.0 Instruct.
Hunyuan Image 3.0 InstructInstruct version of Hunyuan-Image 3.0, with internal reasoning capabilities.
Hunyuan WorldHunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.
IC-Light-v2 for Image RelightingAn endpoint for re-lighting photos and changing their backgrounds per a given description
IdeogramExtend existing images with Ideogram V3's reframe feature. Create expanded versions and adaptations while preserving main image and adding new creative directions through prompt guidance.
IdeogramReimagine existing images with Ideogram V3's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
Ideogram Replace BackgroundReplace backgrounds existing images with Ideogram V3's replace background feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
Ideogram Text to ImageGenerate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
Ideogram UpscaleIdeogram Upscale enhances the resolution of the reference image by up to 2X and might enhance the reference image too. Optionally refine outputs with a prompt for guided improvements.
Ideogram V2Generate high-quality images, posters, and logos with Ideogram V2. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
Ideogram V2AGenerate high-quality images, posters, and logos with Ideogram V2A. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
Ideogram V2A RemixCreate variations of existing images with Ideogram V2A Remix while maintaining creative control through prompt guidance.
Ideogram V2A TurboAccelerated image generation with Ideogram V2A Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.
Ideogram V2A Turbo RemixRapidly create image variations with Ideogram V2A Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance.
Ideogram V2 EditTransform existing images with Ideogram V2's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.
Ideogram V2 RemixReimagine existing images with Ideogram V2's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
Ideogram V2 TurboAccelerated image generation with Ideogram V2 Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.
Ideogram V2 Turbo EditEdit images faster with Ideogram V2 Turbo. Quick modifications and adjustments while preserving the high-quality standards and realistic outputs of Ideogram.
Ideogram V2 Turbo RemixRapidly create image variations with Ideogram V2 Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance.
Ideogram V3 CharacterGenerate consistent character appearances across multiple images. Maintain facial features, proportions, and distinctive traits for cohesive storytelling and branding
Ideogram V3 Character EditModify consistent characters while preserving their core identity. Edit poses, expressions, or clothing without losing recognizable character features
Ideogram V3 Character RemixTransform your consistent character into different art styles, settings, or scenarios while maintaining their distinctive appearance and identity
Ideogram V3 EditTransform existing images with Ideogram V3's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.
Illusion DiffusionCreate illusions conditioned on image.
Image2PixelTurn images into pixel-perfect retro art
Image2svgImage2SVG transforms raster images into clean vector graphics, preserving visual quality while enabling scalable, customizable SVG outputs with precise control over detail levels.
Image EditingThe reframe endpoint intelligently adjusts an image's aspect ratio while preserving the main subject's position, composition, pose, and perspective
Image EditingTransform any person into their baby version, while preserving the original pose and expression with childlike features.
Image EditingAdd realistic weather effects like snowfall, rain, or fog to your photos while maintaining the scene's mood.
Image EditingTransform your photos to any time of day, from golden hour to midnight, with appropriate lighting and atmosphere.
Image EditingRemove unwanted objects or people from your photos while seamlessly blending the background.
Image EditingTurn your casual photos into stunning professional studio portraits with perfect lighting and high-end photography style.
Image EditingPlace your subject in any scene you imagine, from enchanted forests to urban settings, with professional composition and lighting
Image EditingRestore and enhance old or damaged photos by removing imperfections, adding color while preserving the original character and details of the image.
Image EditingRetouch photos of faces. Remove blemishes and improve the skin.
Image EditingPerfect your photos with professional color grading, balanced tones, and vibrant yet natural colors
Image EditingChange facial expressions in photos to any emotion you desire, from smiles to serious looks.
Image EditingTransform your photos into vibrant cool cartoons with bold outlines and rich colors.
Image EditingEnhance facial features with professional retouching while maintaining a natural, realistic look
Image EditingReplace your photo's background with any scene you desire, from beach sunsets to urban landscapes, with perfect lighting and shadows
Image EditingExperiment with different hairstyles, from bald to any style you can imagine, while maintaining natural lighting and realistic results.
Image EditingSee how you or others might look at different ages, from younger to older, while preserving core facial features.
Image EditingTransform your photos into cool plushies while keeping the original characters likeness
Image EditingTransform your photos into wojak style while keeping the original characters likeness
Image EditingTransform your character's hair into broccoli style while keeping the original characters likeness
Image EditingGenerate YouTube thumbnails with custom text
Image EditingAdd details to faces, enhance face features, remove blur.
Image EditingRemove all text and writing from images while preserving the background and natural appearance.
Image EditingTransform your photos into artistic masterpieces inspired by famous styles like Van Gogh's Starry Night or any artistic style you choose.
Imagen3Imagen3 is a high-quality text-to-image model that generates realistic images from text prompts.
Imagen3 FastImagen3 Fast is a high-quality text-to-image model that generates realistic images from text prompts.
Imagen 4Google’s highest quality image generation model
Imagen 4Google’s highest quality image generation model
Imagen 4 UltraGoogle’s highest quality image generation model
Image OutpaintDirectional outpainting. Choose edges to expand. left, right, top, or center (uniform all sides). Only expanded areas are generated; an optional zoom-out pulls the frame back by the chosen amount.
Image PreprocessorsHolistically-Nested Edge Detection (HED) preprocessor.
Image PreprocessorsScribble preprocessor.
Image PreprocessorsM-LSD line segment detection preprocessor.
Image PreprocessorsSegment Anything Model (SAM) preprocessor.
Image PreprocessorsMiDaS depth estimation preprocessor.
Image PreprocessorsTEED (Temporal Edge Enhancement Detection) preprocessor.
Image PreprocessorsLine art preprocessor.
Image PreprocessorsZoeDepth preprocessor.
Image PreprocessorsPIDI (Pidinet) preprocessor.
Image PreprocessorsDepth Anything v2 preprocessor.
Imagineart 1.5 PreviewImagineArt 1.5 text-to-image model generates high-fidelity professional-grade visuals with lifelike realism, strong aesthetics, and text that actually reads correctly.
ImagineArt 1.5 Pro PreviewImagineArt 1.5 Pro is an advanced text-to-image model that creates ultra-high-fidelity 4K visuals with lifelike realism, refined aesthetics, and powerful creative output suited for professional use.
Inpainting sdxl and sdInpaint images with SD and SDXL
Instant CharacterInstantCharacter creates high-quality, consistent characters from text prompts, supporting diverse poses, styles, and appearances with strong identity control.
Invisible WatermarkInvisible Watermark is a model that can add an invisible watermark to an image.
IP Adapter Face IDHigh quality zero-shot personalization
Juggernaut Flux BaseJuggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.
Juggernaut Flux BaseJuggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.
Juggernaut Flux Base LoRAJuggernaut Base Flux LoRA by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.
Juggernaut Flux LightningJuggernaut Lightning Flux by RunDiffusion provides blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence.
Juggernaut Flux LoraJuggernaut Base Flux LoRA Inpainting by RunDiffusion is a drop-in replacement for Flux [Dev] inpainting that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.
Juggernaut Flux ProJuggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.
Juggernaut Flux ProJuggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.
Kling ImageKling Omni 3: Top-tier image-to-image with flawless consistency.
Kling ImageKling Image V3: Latest kling image model
Kling ImageKling V3: Latest Kling Image model
Kling ImageKling Omni 3: Top-tier text-to-image with flawless consistency.
Kling Kolors Virtual TryOn v1.5Kling Kolors Virtual TryOn v1.5 is a high quality image based Try-On endpoint which can be used for commercial try on.
Kling O1 ImagePerform precise image edits using strong reference control, transforming subjects, styles, and local details while preserving visual consistency.
KolorsPhotorealistic Text-to-Image
Kolors Image to ImagePhotorealistic Image-to-Image
Latent Consistency Models (v1.5/XL)Run SDXL at the speed of light
Latent Consistency Models (v1.5/XL)Run SDXL at the speed of light
Latent Consistency Models (v1.5/XL)Run SDXL at the speed of light
Latent Consistency (SDXL & SDv1.5)Produce high-quality images with minimal inference steps.
Layer Diffusion XLSDXL with an alpha channel.
Leffa Pose TransferLeffa Pose Transfer is an endpoint for changing pose of an image with a reference image.
Leffa Virtual TryOnLeffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on.
Lightning ModelsCollection of SDXL Lightning models.
Live PortraitTransfer expression from a video to a portrait.
Longcat ImageLongCat image Edit is a 6B parameter image editing model excelling at multilingual text rendering, photorealism and deployment efficiency.
Longcat ImageLongCat image is a 6B parameter model excelling at multilingual text rendering, photorealism and deployment efficiency.
LucidfluxLucidFlux for upscaling images with very high fidelity
Luma PhotonEdit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
Luma PhotonGenerate images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
Luma PhotonEdit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
Luma Photon FlashGenerate images from your prompts using Luma Photon Flash. Photon Flash is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
Luma Photon Flash ReframeThis advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched speed and quality for creators at a fraction of the cost.
Luma Photon ReframeExtend and reframe images with Luma Photon Reframe. This advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched personalization and quality for creators at a fraction of the cost.
Lumina Image 2Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Makeup ChangerApply realistic makeup styles with adjustable intensity.
Marigold Depth EstimationCreate depth maps using Marigold depth estimation.
Midas Depth EstimationCreate depth maps using Midas depth estimation.
MiniMax (Hailuo AI) Text to ImageGenerate high quality images from text prompts using MiniMax Image-01. Longer text prompts will result in better quality images.
Minimax Image Subject ReferenceGenerate images from text and a reference image using MiniMax Image-01 for consistent character appearance.
MixDehazerAn advanced dehaze model to remove atmospheric haze, restoring clarity and detail in images through intelligent neural network processing.
Moondream3 Preview [Segment]Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
MoonDreamNext DetectionMoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more.
NAFNet-deblurUse NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.
NAFNet-denoiseUse NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.
Nano BananaGoogle's famous original image generation and editing model
Nano BananaGoogle's famous original image generation and editing model
Nano Banana 2Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model
Nano Banana 2Nano Banana 2 is Google's new state-of-the-art image generation and editing model
Nano Banana ProNano Banana Pro is Google's new state-of-the-art image generation and editing model
Nano Banana ProNano Banana Pro is Google's new state-of-the-art image generation and editing model
Nextstep 1Endpoint for NextStep-1 Autoregressive Image Editing model.
Object RemovalRemoves box-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.
Object RemovalRemoves mask-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.
Object RemovalRemove unwanted objects seamlessly from any image.
Object RemovalRemoves objects and their visual effects using natural language, replacing them with contextually appropriate content
OmniGen v1OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
Omnigen V2OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
Omni ZeroAny pose, any style, any identity
OnerewardOneReward is a finetuned version of Flux 1.0 Fill with intelligent editing capabilities.
Optimized Latent Consistency (SDv1.5)Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.
Ovis ImageOvis-Image is a 7B text-to-image model specifically optimized for quick, high quality text rendering.
PASDPixel-Aware Diffusion Model for Realistic Image Super-Resolution and Personalized Stylization
Perspective ChangeEasily adjust the perspective of any image to different angles.
Photography EffectsApply diverse photography styles and effects to transform your images.
PhotoMakerCustomizing Realistic Human Photos via Stacked ID Embedding
Photo RestorationRestore old or damaged photos by fixing colors, scratches, and resolution.
PiflowUse the faster speed of piflow to generate images with same quality to that of slower models.
PixArt-ΣWeak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Playground v2.5State-of-the-art open-source model in aesthetic quality
Playground v2.5State-of-the-art open-source model in aesthetic quality
Playground v2.5State-of-the-art open-source model in aesthetic quality
PlushifyTurn any image into a cute plushie!
Pony V7Pony V7 is a finetuned text to image for superior aesthetics and prompt following.
Portrait EnhanceEnhance and refine portrait photos with improved clarity and detail.
Post ProcessingAdjust color temperature, brightness, contrast, saturation, and gamma values for color correction.
Post ProcessingApply Gaussian or Kuwahara blur effects with adjustable radius and sigma parameters
Post ProcessingCreate chromatic aberration by shifting red, green, and blue channels horizontally or vertically with customizable shift amounts.
Post ProcessingApply various color tints (sepia, red, green, blue, cyan, magenta, yellow, purple, orange, warm, cool, lime, navy, vintage, rose, teal, maroon, peach, lavender, olive) with adjustable strength.
Post ProcessingReduce color saturation using different methods (luminance Rec.709, luminance Rec.601, average, lightness) with adjustable factor.
Post ProcessingBlend two images together using smooth linear interpolation with a configurable blend factor.
Post ProcessingApply dodge and burn effects with multiple modes and adjustable intensity.
Post ProcessingApply film grain effect with different styles (modern, analog, kodak, fuji, cinematic, newspaper) and customizable intensity and scale
Post ProcessingApply a parabolic distortion effect with configurable coefficient and vertex position.
Post ProcessingApply sharpening effects with three modes: basic unsharp mask, smart sharpening with edge preservation, and Contrast Adaptive Sharpening (CAS).
Post ProcessingApply solarization effect by inverting pixel values above a threshold
Post ProcessingAdd a darkening vignette effect around the edges of the image with adjustable strength
Post ProcessingPost Processing is an endpoint that can enhance images using a variety of techniques including grain, blur, sharpen, and more.
Product HoldingPlace products naturally in a person’s hands for realistic marketing visuals.
Product PhotographyGenerate professional product photography with realistic lighting and backgrounds.
PuLIDTuning-free ID customization.
PuLID FluxAn endpoint for personalized image generation using Flux as per given description.
Qwen ImageQwen-Image (Image-to-Image) transforms and edits input images with high fidelity, enabling precise style transfer, enhancement, and creative modification.
Qwen ImageQwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.
Qwen Image 2Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model
Qwen Image 2Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model
Qwen Image 2Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model
Qwen Image 2Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model
Qwen Image 2512Qwen Image 2512 is an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.
Qwen Image 2512LoRA inference endpoint for Qwen Image 2512, an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.
Qwen Image EditEndpoint for Qwen's Image Editing model. Has superior text editing capabilities.
Qwen Image EditImage to Image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.
Qwen Image EditInpainting Endpoint for the Qwen Edit Image editing model.
Qwen Image Edit 2509Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support.
Qwen Image Edit 2509 LoraLoRA endpoint for the Qwen Image Edit 2509 model.
Qwen Image Edit 2509 Lora GalleryGenerate full portrait from a cropped face photo
Qwen Image Edit 2509 Lora GalleryAdd a realistic scene behind the object with white background
Qwen Image Edit 2509 Lora GalleryRemove unwanted elements (objects, people, text) while maintaining image consistency
Qwen Image Edit 2509 Lora GalleryBlend products into backgrounds with automatic perspective and lighting correction
Qwen Image Edit 2509 Lora GalleryRemove existing lighting and apply soft, even illumination
Qwen Image Edit 2509 Lora GalleryCreate group photos
Qwen Image Edit 2509 Lora GalleryApply designs/graphics onto people's shirts
Qwen Image Edit 2509 Lora GalleryCreate cinematic transitions and scene progressions (camera movements, framing changes)
Qwen Image Edit 2509 Lora GalleryPrecise camera position and angle control (rotation, zoom, vertical movement)
Qwen Image Edit 2509 Lora GalleryRemoves harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination.
Qwen Image Edit 2511Endpoint for Qwen's Image Editing 2511 model with LoRa support.
Qwen Image Edit 2511Endpoint for Qwen's Image Editing 2511 model.
Qwen Image Edit 2511 Multiple AnglesGenerates same scene from different angles (azimuth/elevation) with Qwen image Edit 2511 and the Lora Multiple Angles
Qwen Image Edit LoraLoRA inference endpoint for the Qwen Image Editing model.
Qwen Image Edit PlusEndpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support.
Qwen Image Edit Plus LoraLoRA endpoint for the Qwen Image Edit Plus model.
Qwen Image Edit Plus Lora GalleryAdd a realistic scene behind the object with white background
Qwen Image Edit Plus Lora GalleryGenerate full portrait from a cropped face photo
Qwen Image Edit Plus Lora GalleryCreate group photos
Qwen Image Edit Plus Lora GalleryBlend products into backgrounds with automatic perspective and lighting correction
Qwen Image Edit Plus Lora GalleryCreate cinematic transitions and scene progressions (camera movements, framing changes)
Qwen Image Edit Plus Lora GalleryRemove unwanted elements (objects, people, text) while maintaining image consistency
Qwen Image Edit Plus Lora GalleryRemove existing lighting and apply soft, even illumination
Qwen Image Edit Plus Lora GalleryApply designs/graphics onto people's shirts
Qwen Image Edit Plus Lora GalleryPrecise camera position and angle control (rotation, zoom, vertical movement)
Qwen Image Edit Plus Lora GalleryRemoves harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination.
Qwen Image LayeredQwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers.
Qwen Image LayeredQwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. Use loras to get your custom outputs.
Qwen Image MaxText-to-Image endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.
Qwen Image MaxImage editing endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.
Realistic VisionGenerate realistic images.
RecraftConverts a given raster image to SVG format using Recraft model.
Recraft 20bRecraft 20b is a new and affordable text-to-image model.
Recraft Creative UpscaleEnhances a given raster image using the 'creative upscale' tool, increasing image resolution, making the image sharper and cleaner.
Recraft Crisp UpscaleEnhances a given raster image using 'crisp upscale' tool, boosting resolution with a focus on refining small details and faces.
Recraft V3Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
Recraft V3Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
Recraft V4Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.
Recraft V4 ProRecraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.
Recraft V4 Pro (Vector)Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.
Recraft V4 (Vector)Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing.
ReimagineReimagine uses a structure reference for generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data
RelightingAdjust and enhance images with different lighting styles.
Rembg Enhance (Remove Background Enhance)Rembg-enhance is optimized for 2D vector images, 3D graphics, and photos by leveraging matting technology.
Remove BackgroundRemove the background from an image.
Replace BackgroundCreates enriched product shots by placing them in various environments using textual descriptions.
ReveReve’s edit model lets you upload an existing image and then transform it via a text prompt
ReveReve’s text-to-image model generates detailed visual output that closely follow your instructions, with strong aesthetic quality and accurate text rendering.
ReveReve’s remix model lets you upload an reference images and then combine/transform them via a text prompt
ReveReve’s fast remix model lets you upload an reference images and then combine/transform them via a text prompt at lightning speed!
ReveReve’s fast edit model lets you upload an existing image and then transform it via a text prompt at lightning speed!
RIFEInterpolate images with RIFE - Real-Time Intermediate Flow Estimation
Rundiffusion Photo FluxRunDiffusion Photo Flux provides insane realism. With this enhancer, textures and skin details burst to life, turning your favorite prompts into vivid, lifelike creations. Recommended to keep it at 0.65 to 0.80 weight. Supports resolutions up to 1536x1536.
Sam 3SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
SanaSana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, with the ability to generate 4K images in less than a second.
Sana SprintSana Sprint is a text-to-image model capable of generating 4K images with exceptional speed.
Sana v1.5 1.6BSana v1.5 1.6B is a lightweight text-to-image model that delivers 4K image generation with impressive efficiency.
Sana v1.5 4.8BSana v1.5 4.8B is a powerful text-to-image model that generates ultra-high quality 4K images with remarkable detail.
SD 1.5 Depth ControlNetSD 1.5 ControlNet
SDXL ControlNet UnionAn efficent SDXL multi-controlnet text-to-image model.
SDXL ControlNet UnionAn efficent SDXL multi-controlnet image-to-image model.
SDXL ControlNet UnionAn efficent SDXL multi-controlnet inpainting model.
SeedVR2Use SeedVR2 to upscale your images
Segment Anything Model 2SAM 2 is a model for segmenting images and videos in real-time.
Segment Anything Model 2SAM 2 is a model for segmenting images automatically. It can return individual masks or a single mask for the entire image.
Segment Anything Model 3SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
Sky RaccoonGenerate images from a text prompt.
SoteDiffusionAnime finetune of Würstchen V3.
Stable CascadeStable Cascade: Image generation on a smaller & cheaper latent space.
Stable Diffusion 3.5 LargeStable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Stable Diffusion 3.5 MediumStable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Stable Diffusion v1.5Stable Diffusion v1.5
Stable Diffusion V3Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
Stable Diffusion V3Stable Diffusion 3 Medium (Text to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
Stable Diffusion with LoRAsRun Any Stable Diffusion model with customizable LoRA weights.
Stable Diffusion with LoRAsRun Any Stable Diffusion model with customizable LoRA weights.
Stable Diffusion with LoRAsRun Any Stable Diffusion model with customizable LoRA weights.
Stable Diffusion XLRun SDXL at the speed of light
Stable Diffusion XLRun SDXL at the speed of light
Stable Diffusion XLRun SDXL at the speed of light
Stable Diffusion XL LightningRun SDXL at the speed of light
Stable Diffusion XL LightningRun SDXL at the speed of light
Stable Diffusion XL LightningRun SDXL at the speed of light
StarVectorAI vectorization model that transforms raster images into scalable SVG graphics, preserving visual details while enabling infinite scaling and easy editing capabilities.
Step1X EditStep1X-Edit transforms your photos with simple instructions into stunning, professional-quality edits—rivaling top proprietary tools.
Stepx Edit2Image-to-image editing with Step1X-Edit v2 from StepFun. Reasoning-enhanced modifications through a thinking–editing–reflection loop with MLLM world knowledge for abstract instruction comprehension.
Style TransferApply artistic styles like impressionism, cubism, or surrealism to your images.
SWIN2SREnhance low-resolution images with the superior quality of Swin2SR for sharper, clearer results.
Switti 1024Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.
Switti 512Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.
Texture TransformTransform objects with different surface textures like marble, wood, or fabric.
TheraFix low resolution images with fast speed and quality of thera.
TopazUse the powerful and accurate topaz image enhancer to enhance your images.
try-onImage based high quality Virtual Try-On
UnoAn AI model that transforms input images into new ones based on text prompts, blending reference visuals with your creative directions.
UpscaleRegenerate the image with sharper textures and richer details while upscaling resolution to 4 megapixel.
Upscale ImagesUpscale images by a given factor.
UsoUse USO to perform subject driven generations using reference image.
ViduVidu Reference-to-Image creates images by using a reference images and combining them with a prompt.
ViduVidu Reference-to-Image creates images by using a reference images and combining them with a prompt.
ViduUse vidu Text-to-Image to turn your prompts into reality.
Virtual Try-onTry on clothes virtually by combining person and clothing images.
WanWan 2.2's 5B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail
WanWan 2.2's 14B model edit high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail
WanWan 2.2's 14B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail
Wan 2.5 Image to ImageWan 2.5 image-to-image model.
Wan 2.5 Text to ImageWan 2.5 text-to-image model.
Wan v2.2 A14B Text-to-Image A14B with LoRAsWan 2.2's 14B model with LoRA support generates high-fidelity images with enhanced prompt alignment, style adaptability.
Wan v2.6 Image to ImageWan 2.6 image-to-image model.
Wan v2.6 Text to ImageWan 2.6 text-to-image model.
Workflow UtilitiesFFMPEG Untility for Extracting nth Frame
Z Image BaseZ-Image is the foundation model of the Z- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence.
Z Image Base (LoRA)LoRA endpoint for Z-Image, the foundation model of the Z- Image family.
Z-Image TurboZ-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
Z-Image TurboGenerate images from text, an image and a mask using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
Z-Image TurboText-to-Image endpoint with LoRA support for Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
Z-Image TurboGenerate images from text and edge, depth or pose images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
Z-Image TurboGenerate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
Z-Image TurboGenerate images from text and images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
Z-Image TurboGenerate images from text and images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
Z-Image TurboGenerate images from text, an image, a mask and custom LoRA using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
AI Detector (Image) is an advanced service that analyzes a single picture and returns a verdict on whether it was likely created by AI.
ArbiterImage reference comparison measurements
ArbiterSemantic image alignment measurements
ArbiterReference-free image measurements
BagelBagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both text and images.
Ffmpeg ApiGet EBU R128 loudness normalization from audio files using FFmpeg API.
FFmpeg API MetadataGet encoding metadata from video and audio files using FFmpeg API.
FFmpeg API WaveformGet waveform data from audio files using FFmpeg API.
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
Florence-2 LargeFlorence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
GOT OCR 2.0GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music.
Isaac 0.1Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.
LLaVA v1.6 34BVision
MoondreamAnswer questions from the images.
Moondream2Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
Moondream2Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
Moondream2Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
Moondream2Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
Moondream3 Preview [Caption]Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Moondream3 Preview [Detect]Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Moondream3 Preview [Point]Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Moondream 3 Preview [Query]Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
MoonDreamNextMoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.
MoonDreamNext BatchMoonDreamNext Batch is a multimodal vision-language model for batch captioning.
NSFW CheckerPredict whether an image is NSFW or SFW.
NSFW FilterPredict the probability of an image being NSFW.
OpenRouterRun any LLM (Large Language Model) with fal, powered by OpenRouter.
OpenRouter [Audio]Run any ALM (Audio Language Model) with fal, powered by OpenRouter.
OpenRouter [Vision]Run any VLM (Vision Language Model) with fal, powered by OpenRouter.
Qwen 3 Guard [8B]Use Qwen 3 Guard [8B] to detect and classify text as safe or harmful, delivering precise and reliable safety categorization.
Sa2VA 4B ImageSa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Sa2VA 4B VideoSa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Sa2VA 8B ImageSa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Sa2VA 8B VideoSa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
Sam 3SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
Video Prompt GeneratorGenerate video prompts using a variety of techniques including camera direction, style, pacing, special effects and more.
Video UnderstandingA video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.
Workflow Utilitiesffmpeg utility to interleave videos
MultiTalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
Ai AvatarMultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.
Ai AvatarMultiTalk model generates a multi-person conversation video from an image and text inputs. Converts text to speech for each person, generating a realistic conversation scene.
Ai AvatarMultiTalk model generates a multi-person conversation video from an image and audio files. Creates a realistic scene where multiple people speak in sequence.
Ai Face SwapAI-FaceSwap-Video is a service that can replace a person's face throughout a video clip while keeping their movements natural.
AMT Frame InterpolationInterpolate between image frames
AMT InterpolationInterpolate between video frames
AnimateDiffRe-animate your videos!
AnimateDiffAnimate your ideas!
Animatediff SparseCtrl LCMAnimate Your Drawings with Latent Consistency Models!
AnimateDiff TurboAnimate your ideas in lightning speed!
AnimateDiff TurboRe-animate your videos in lightning speed!
Auto-CaptionerAutomatically generates text captions for your videos from the audio as per text colour/font specifications
AvatarsGenerate high-quality videos with UGC-like avatars from text
AvatarsGenerate high-quality videos with UGC-like avatars from audio
Avatars Audio to VideoHigh-quality avatar videos that feel real, generated from your audio
Avatars Text to VideoHigh-quality avatar videos that feel real, generated from your text
Ben-Video-Bg-RmA model for high quality and smooth background removal for videos.
BirefnetVideo background removal version of bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
Bria Video EraserA high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency
Bria Video EraserA high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
Bria Video EraserA high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
BytedanceImage to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost
BytedanceTransform your images into stylized videos using this workflow.
BytedanceGenerate videos with audio with Seedance 1.5 (supports start & end frame)
BytedanceGenerate videos with audio with Seedance 1.5
BytedanceSeedance lite reference-to-video allows the use of 1 to 4 images as reference to create a high-quality video.
BytedanceTransfer motion from a video to characters in an image using Dreamactor v2. Great performance for non-human and multiple characters
BytedanceText to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost
Bytedance OmniHuman v1.5Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.
Bytedance UpscalerUpscale videos with Bytedance's video upscaler.
CogVideoX-5BGenerate videos from images and prompts using CogVideoX-5B
CogVideoX-5BGenerate videos from videos and prompts using CogVideoX-5B
CogVideoX-5BGenerate videos from prompts using CogVideoX-5B
ControlNeXt SVDAnimate a reference image with a driving video using ControlNeXt.
Cosmos Predict 2.5 2BGenerate video from text and videos using NVIDIA's 2B Cosmos Post-Trained Model
Cosmos Predict 2.5 2BGenerate video from text and images using NVIDIA's 2B Cosmos Post-Trained Model
Cosmos Predict 2.5 2BGenerate video from text using NVIDIA's 2B Cosmos Post-Trained Model
Cosmos Predict 2.5 2B DistilledGenerate video from text and videos using NVIDIA's 2B Cosmos Distilled Model
Creatify AuroraGenerate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team!
Crystal Upscaler [Video]Do high precision video upscaling that respects the original video perfectly using Crystal Upscaler's new video upscaling method!
DecartLucy-5B is a model that can create 5-second I2V videos in under 5 seconds, achieving >1x RTF end-to-end
Decart Lucy 14bLucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI
Depth Anything VideoGenerates depth maps from video using Video Depth Anything (CVPR 2025). Produces per-frame depth estimation with temporal consistency across frames. Supports 3 model sizes (Small, Base, Large), 5 colormaps including grayscale, side-by-side comparison with the original video, and raw depth export as .npz. Useful for 3D reconstruction, video effects, compositing, and scene understanding.
DubbingThis endpoint delivers seamlessly localized videos by generating lip-synced dubs in multiple languages, ensuring natural and immersive multilingual experiences
DWPose Pose PredictionPredict poses from videos.
EchoMimic V3EchoMimic V3 generates a talking avatar model from a picture, audio and text prompt.
EdittoEdit videos using instruction-based prompting using Editto model!
ElevenLabs DubbingGenerate dubbed videos or audios using ElevenLabs Dubbing feature!
Fabric 1.0VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video
Fabric 1.0VEED Fabric 1.0 text-to-video API
Fabric 1.0 FastVEED Fabric 1.0 is an image-to-video API that turns any image into a talking video
Ffmpeg ApiUse ffmpeg capabilities to merge 2 or more videos.
FFmpeg API ComposeCompose videos from multiple media sources using FFmpeg API.
Ffmpeg Api Merge Audio-VideoMerge videos with standalone audio files or audio from video files.
FILMInterpolate videos with FILM - Frame Interpolation for Large Motion
FlashvsrUpscale your videos using FlashVSR with the fastest speeds!
FramepackFramepack is an efficient Image-to-video model that autoregressively generates videos.
FramepackFramepack is an efficient Image-to-video model that autoregressively generates videos.
Framepack F1Framepack is an efficient Image-to-video model that autoregressively generates videos.
Grok Imagine VideoGenerate videos from images with audio using xAI's Grok Imagine Video model.
Grok Imagine VideoGenerate videos with audio from text using Grok Imagine Video.
Grok Imagine VideoEdit videos using xAI's Grok Imagine
HeygenHeygen Avatar V3 Model for Digital Twin
HeygenHeygen Translate Model with Extreme Speed
HeygenHeygen Avatar 4 Digital Twin Model
HeygenHeygen Translate Model with Extreme Precision
HeygenHeygen Text to Video Generation Model
HeygenHeygen Photo Avatar 4 Model
High Quality Stable Video DiffusionGenerate short video clips from your images using SVD v1.1
Hunyuan AvatarHunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters .
Hunyuan CustomHunyuanCustom revolutionizes video generation with unmatched identity consistency across multiple input types. Its innovative fusion modules and alignment networks outperform competitors, maintaining subject integrity while responding flexibly to text, image, audio, and video conditions.
Hunyuan PortraitHunyuanPortrait is a diffusion-based framework for generating lifelike, temporally consistent portrait animations.
Hunyuan VideoHunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions.
Hunyuan Video FoleyUse the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them.
Hunyuan Video Image-to-Video InferenceImage to Video for the high-quality Hunyuan Video I2V model.
Hunyuan Video Image-to-Video LoRA InferenceImage to Video for the Hunyuan Video model using a custom trained LoRA.
Hunyuan Video LoRA InferenceHunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability
Hunyuan Video LoRA Inference (Video-to-Video)Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos.
Hunyuan Video V1.5Hunyuan Video 1.5 is Tencent's latest and best video model
Hunyuan Video V1.5Hunyuan Video 1.5 is Tencent's latest and best video model
Hunyuan Video (Video-to-Video)Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos.
InfinitalkInfinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
InfinitalkInfinitalk model generates a talking avatar video from a text and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
InfinitalkInfinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
Infinity StarInfinityStar’s unified 8B spacetime autoregressive engine to turn any text prompt into crisp 720p videos - 10× faster than diffusion models.
Kandinsky5Kandinsky 5.0 Distilled is a lightweight diffusion model for fast, high-quality text-to-video generation.
Kandinsky5Kandinsky 5.0 is a diffusion model for fast, high-quality text-to-video generation.
Kandinsky5 ProKandinsky 5.0 Pro is a diffusion model for fast, high-quality text-to-video generation.
Kandinsky5 ProKandinsky 5.0 Pro is a diffusion model for fast, high-quality image-to-video generation.
Kling 1.0Generate video clips from your images using Kling 1.0
Kling 1.0Generate video clips from your prompts using Kling 1.0
Kling 1.0Generate video clips from your prompts using Kling 1.0
Kling 1.5Generate video clips from your prompts using Kling 1.5 (pro)
Kling 1.5Generate video clips from your images using Kling 1.5 (pro)
Kling 1.5Generate video clips from your prompts using Kling 1.5 (pro)
Kling 1.6Generate video clips from your prompts using Kling 1.6 (pro)
Kling 1.6Generate video clips from your prompts using Kling 1.6 (pro)
Kling 1.6Generate video clips from your prompts using Kling 1.6 (std)
Kling 1.6Generate video clips from your prompts using Kling 1.6 (std)
Kling 1.6Generate video clips from your images using Kling 1.6 (pro)
Kling 1.6Generate video clips from your images using Kling 1.6 (std)
Kling 1.6 ElementsGenerate video clips from your multiple image references using Kling 1.6 (standard)
Kling 1.6 ElementsGenerate video clips from your multiple image references using Kling 1.6 (pro)
Kling 2.0 MasterGenerate video clips from your prompts using Kling 2.0 Master
Kling 2.0 MasterGenerate video clips from your images using Kling 2.0 Master
Kling 2.1 MasterKling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
Kling 2.1 MasterKling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
Kling 2.1 (pro)Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.
Kling 2.1 (standard)Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation
Kling AI AvatarKling AI Avatar Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
Kling AI Avatar ProKling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
Kling AI Avatar v2 ProKling AI Avatar v2 Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
Kling AI Avatar v2 StandardKling AI Avatar v2 Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
Kling LipSync Audio-to-VideoKling LipSync is an audio-to-video model that generates realistic lip movements from audio input.
Kling LipSync Text-to-VideoKling LipSync is a text-to-video model that generates realistic lip movements from text input.
Kling O1 Edit Video [Pro]Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure.
Kling O1 Edit Video [Standard]Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure.
Kling O1 First Frame Last Frame to Video [Pro]Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.
Kling O1 First Frame Last Frame to Video [Standard]Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.
Kling O1 Reference Image to Video [Pro]Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
Kling O1 Reference Image to Video [Standard]Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
Kling O1 Reference Video to Video [Pro]Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.
Kling O1 Reference Video to Video [Standard]Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.
Kling O3 Edit Video [Pro]Edit videos using Kling O3 from Kling Team!
Kling O3 Edit Video [Standard]Edit videos using Kling O3 from Kling Team!
Kling O3 Image to Video [Pro]Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.
Kling O3 Image to Video [Pro]Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.
Kling O3 Reference to Video [Pro]Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
Kling O3 Reference to Video [Standard]Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
Kling O3 Reference Video to Video [Pro]Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.
Kling O3 Reference Video to Video [Standard]Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.
Kling O3 Text to Video [Pro]Generate realistic videos using Kling O3 from Kling Team!
Kling O3 Text to Video [Standard]Generate realistic videos using Kling O3 from Kling Team!
Kling v2.5 Text to VideoKling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
Kling VideoKling 2.5 Turbo Standard: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
Kling VideoTransfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.
Kling VideoTransfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.
Kling VideoKling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
Kling Video v2.6 Image to VideoKling 2.6 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation.
Kling Video v2.6 Motion Control [Pro]Transfer movements from a reference video to any character image. Pro mode delivers higher quality output, ideal for complex dance moves and gestures.
Kling Video v2.6 Motion Control [Standard]Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.
Kling Video v2.6 Text to VideoKling 2.6 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation.
Kling Video v3 Image to Video [Pro]Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.
Kling Video v3 Image to Video [Standard]Kling 3.0 Standard: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.
Kling Video v3 Text to Video [Pro]Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.
Kling Video v3 Text to Video [Standard]Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.
Krea Wan 14BSuperfast video model based on Wan 2.1 14b by Krea, excelling at real-time video-editing.
Krea Wan 14b- Text to VideoFast Text-to-Video endpoint for Krea's Wan 14b model.
LatentSyncLatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.
LightxUse tlightx capabilities to relight and recamera your videos.
LightxUse the capabilities of lightx to relight and recamera your videos.
LipsyncGenerate realistic lipsync from any audio using VEED's latest model
Live AvatarReal-time avatar generation with Live Avatar. Have natural face-to-face conversations with AI avatars that respond instantly—streaming infinite-length video with immediate visual feedback.
Live PortraitTransfer expression from a video to a portrait.
Longcat Multi AvatarLongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.
Longcat Single AvatarLongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.
Longcat Single AvatarLongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.
LongCat VideoGenerate long videos in 720p/30fps from text using LongCat Video
LongCat VideoGenerate long videos in 720p/30fps from images using LongCat Video
LongCat VideoGenerate long videos from images using LongCat Video
LongCat VideoGenerate long videos from text using LongCat Video
LongCat Video DistilledGenerate long videos from text using LongCat Video Distilled
LongCat Video DistilledGenerate long videos from images using LongCat Video Distilled
LongCat Video DistilledGenerate long videos in 720p/30fps from text using LongCat Video Distilled
LongCat Video DistilledGenerate long videos in 720p/30fps from images using LongCat Video Distilled
LTX 2.0 Video ProGenerate video from audio using LTX-2
LTX-2 19BGenerate video with audio from text using LTX-2 and custom LoRA
LTX-2 19BGenerate video with audio from images using LTX-2
LTX-2 19BGenerate video with audio from text using LTX-2
LTX-2 19BExtend video with audio using LTX-2
LTX-2 19BGenerate video with audio from images using LTX-2 and custom LoRA
LTX-2 19BExtend video with audio using LTX-2 and custom LoRA
LTX-2 19BGenerate video with audio from videos using LTX-2
LTX-2 19BGenerate video with audio from videos using LTX-2 and custom LoRA
LTX-2 19BGenerate video with audio from audio, text and images using LTX-2
LTX-2 19BGenerate video with audio from audio, text and images using LTX-2 and custom LoRA
LTX-2 19B DistilledGenerate video with audio from images using LTX-2 Distilled and custom LoRA
LTX-2 19B DistilledGenerate video with audio from audio, text and images using LTX-2 Distilled and custom LoRA
LTX-2 19B DistilledGenerate video with audio from audio, text and images using LTX-2 Distilled
LTX-2 19B DistilledGenerate video with audio from videos using LTX-2 Distilled and custom LoRA
LTX-2 19B DistilledGenerate video with audio from videos using LTX-2 Distilled
LTX-2 19B DistilledExtend videos with audio using LTX-2 Distilled and custom LoRA
LTX-2 19B DistilledGenerate video with audio from text using LTX-2 Distilled
LTX-2 19B DistilledGenerate video with audio from text using LTX-2 Distilled and custom LoRA
LTX-2 19B DistilledGenerate video with audio from images using LTX-2 Distilled
LTX-2 19B DistilledExtend videos with audio using LTX-2 Distilled
LTX 2.3 Video FastLTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.
LTX 2.3 Video FastLTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.
LTX 2.3 Video ProLTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.
LTX 2.3 Video ProLTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.
LTX Video-0.9.5Generate videos from prompts using LTX Video-0.9.5
LTX Video-0.9.5Generate videos from prompts,images, and videos using LTX Video-0.9.5
LTX Video-0.9.5Generate videos from prompts and videos using LTX Video-0.9.5
LTX Video-0.9.7 13BGenerate videos from prompts using LTX Video-0.9.7 13B and custom LoRA
LTX Video-0.9.7 13BGenerate videos from prompts, images, and videos using LTX Video-0.9.7 13B and custom LoRA
LTX Video-0.9.7 13BExtend videos using LTX Video-0.9.7 13B and custom LoRA
LTX Video-0.9.7 13BGenerate videos from prompts and images using LTX Video-0.9.7 13B and custom LoRA
LTX Video-0.9.7 13B DistilledGenerate videos from prompts, images, and videos using LTX Video-0.9.7 13B Distilled and custom LoRA
LTX Video-0.9.7 13B DistilledGenerate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA
LTX Video-0.9.7 13B DistilledGenerate videos from prompts using LTX Video-0.9.7 13B Distilled and custom LoRA
LTX Video-0.9.7 13B DistilledExtend videos using LTX Video-0.9.7 13B Distilled and custom LoRA
LTX Video-0.9.7 LoRAGenerate videos from prompts and images using LTX Video-0.9.7 and custom LoRA
LTX Video-0.9.7 LoRAGenerate videos from prompts, images, and videos using LTX Video-0.9.7 and custom LoRA
LTX-Video 13B 0.9.8 DistilledGenerate long videos from prompts using LTX Video-0.9.8 13B Distilled and custom LoRA
LTX-Video 13B 0.9.8 DistilledGenerate long videos from prompts, images, and videos using LTX Video-0.9.8 13B Distilled and custom LoRA
LTX-Video 13B 0.9.8 DistilledGenerate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA
LTX-Video 13B 0.9.8 DistilledExtend videos using LTX Video-0.9.8 13B Distilled and custom LoRA
LTX Video 2.0 FastCreate high-fidelity video with audio from text with LTX-2 Fast
LTX Video 2.0 FastCreate high-fidelity video with audio from images with LTX-2 Fast
LTX Video 2.0 ProCreate high-fidelity video with audio from text with LTX-2 Pro.
LTX Video 2.0 ProCreate high-fidelity video with audio from images with LTX-2 Pro
LTX Video 2.0 RetakeChange sections of a video using LTX-2
LTX Video 2.3 ProLTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.
LTX Video 2.3 ProLTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.
LTX Video 2.3 ProLTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video.
LTX Video (preview)Generate videos from prompts using LTX Video
LTX Video (preview)Generate videos from images using LTX Video
Lucy Edit [Dev]Edit outfits, objects, faces, or restyle your video - all with maximum detail retention.
Lucy Edit [Fast]Lucy Edit Fast is a rapid, localized video editing model that lets you modify specific elements like objects, or backgrounds in just 10 seconds.
Lucy Edit [Pro]Edit outfits, objects, faces, or restyle your video - all with maximum detail retention.
Lucy Image to VideoLucy delivers lightning fast performance that redefines what's possible with image to video AI
Lucy RestyleRestyle videos up to 30 min long - maintaining maximum detail quality.
Luma Ray 2Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion.
Luma Ray 2 FlashRay2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion.
Luma Ray 2 Flash (Image to Video)Ray2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion.
Luma Ray 2 Flash ModifyRay2 Flash Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.
Luma Ray 2 Flash ReframeAdjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility.
Luma Ray 2 (Image to Video)Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion.
Luma Ray 2 ModifyRay2 Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.
Luma Ray 2 ReframeAdjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility.
LynxGenerate subject consistent videos using Lynx from ByteDance!
MAGI-1MAGI-1 extends videos with an exceptional understanding of physical interactions and prompts
MAGI-1MAGI-1 is a video generation model with exceptional understanding of physical interactions and cinematic prompts
MAGI-1MAGI-1 generates videos from images with exceptional understanding of physical interactions and prompting
MAGI-1 (Distilled)MAGI-1 distilled generates videos faster from images with exceptional understanding of physical interactions and prompting
MAGI-1 (Distilled)MAGI-1 distilled extends videos faster with an exceptional understanding of physical interactions and prompts
MAGI-1 (Distilled)MAGI-1 distilled is a faster video generation model with exceptional understanding of physical interactions and cinematic prompts
Marey Realism V1.5Pull motion from a reference video and apply it to new subjects or scenes.
Marey Realism V1.5Generate a video from a text prompt with Marey, a generative video model trained exclusively on fully licensed data.
Marey Realism V1.5Generate a video starting from an image as the first frame with Marey, a generative video model trained exclusively on fully licensed data.
Marey Realism V1.5Ideal for matching human movement. Your input video determines human poses, gestures, and body movements that will appear in the generated video.
MinimaxCreate blazing fast and economical videos with MiniMax Hailuo-02 Image To Video API at 512p resolution
MiniMax Hailuo 02 [Pro] (Image to Video)MiniMax Hailuo-02 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution
MiniMax Hailuo 02 [Pro] (Text to Video)MiniMax Hailuo-02 Text To Video API (Pro, 1080p): Advanced video generation model with 1080p resolution
MiniMax Hailuo 02 [Standard] (Image to Video)MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions
MiniMax Hailuo 02 [Standard] (Text to Video)MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution
MiniMax Hailuo 2.3 Fast [Pro] (Image to Video)MiniMax Hailuo-2.3-Fast Image To Video API (Pro, 1080p): Advanced fast image-to-video generation model with 1080p resolution
MiniMax Hailuo 2.3 Fast [Standard] (Image to Video)MiniMax Hailuo-2.3-Fast Image To Video API (Standard, 768p): Advanced fast image-to-video generation model with 768p resolution
MiniMax Hailuo 2.3 [Pro] (Image to Video)MiniMax Hailuo-2.3 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution
MiniMax Hailuo 2.3 [Pro] (Text to Video)MiniMax Hailuo-2.3 Text To Video API (Pro, 1080p): Advanced text-to-video generation model with 1080p resolution
MiniMax Hailuo 2.3 [Standard] (Image to Video)MiniMax Hailuo-2.3 Image To Video API (Standard, 768p): Advanced image-to-video generation model with 768p resolution
MiniMax Hailuo 2.3 [Standard] (Text to Video)MiniMax Hailuo-2.3 Text To Video API (Standard, 768p): Advanced text-to-video generation model with 768p resolution
MiniMax (Hailuo AI) Video 01Generate video clips from your images using MiniMax Video model
MiniMax (Hailuo AI) Video 01Generate video clips from your prompts using MiniMax model
MiniMax (Hailuo AI) Video 01 DirectorGenerate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control.
MiniMax (Hailuo AI) Video 01 Director - Image to VideoGenerate video clips more accurately with respect to initial image, natural language descriptions, and using camera movement instructions for shot control.
MiniMax (Hailuo AI) Video 01 LiveGenerate video clips from your images using MiniMax Video model
MiniMax (Hailuo AI) Video 01 LiveGenerate video clips from your prompts using MiniMax model
MiniMax (Hailuo AI) Video 01 Subject ReferenceGenerate video clips maintaining consistent, realistic facial features and identity across dynamic video content
Mirelo SFXGenerate synced sounds for any video, and return it with its new sound track (like MMAudio)
Mirelo SFX V1.5Generate synced sounds for any video, and return it with its new sound track (like MMAudio)
MMAudio V2MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.
Mochi 1Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.
Multishot MasterMultiShotMaster is a controllable multi-shot narrative video generation framework that supports text-driven inter-shot consistency, variable shot counts and shot durations, customized subject with motion control, and background-driven customized scene.
MuseTalkMuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.
OmniHumanOmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.
One To All AnimationOne-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes
One To All AnimationOne-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes
OviOvi can generate videos with audio from image and text inputs.
Ovi Text to VideoA unified paradigm for audio-video generation
PikaDiscover ultimate control with Pikaframes key frame interpolation, a stunning image-to-video feature that allows you to upload up to 5 keyframes, customize their transition length and prompt, and see their images come to life as seamless videos.
Pikadditions (v2)Pikadditions is a powerful video-to-video AI model that allows you to add anyone or anything to any video with seamless integration.
Pika Effects (v1.5)Pika Effects are AI-powered video effects designed to modify objects, characters, and environments in a fun, engaging, and visually compelling manner.
Pika Image to Video Turbo (v2)Turbo is the model to use when you feel the need for speed. Turn your image to stunning video up to 3x faster – all with high quality outputs.
Pika Image to Video (v2.1)Turn photos into mind-blowing, dynamic videos. Your images can can come to life with sharp details, impressive character control and cinematic camera moves.
Pika Image to Video (v2.2)Turn photos into mind-blowing, dynamic videos in up to 1080p. Experience better image clarity and crisper, sharper visuals.
Pika Scenes (v2.2)Pika Scenes v2.2 creates videos from a images with high quality output.
Pika Text to Video Turbo (v2)Pika v2 Turbo creates videos from a text prompt with high quality output.
Pika Text to Video (v2.1)Start with a simple text input to create dynamic generations that defy expectations. Anything you dream can come to life with sharp details, impressive character control and cinematic camera moves.
Pika Text to Video (v2.2)Start with a simple text input to create dynamic generations that defy expectations in up to 1080p. Experience better image clarity and crisper, sharper visuals.
PixverseUse the latest pixverse v5.6 model to turn your texts into amazing videos.
PixverseGenerate high quality video clips with different effects using PixVerse v4.5
PixverseGenerate high quality video clips from text and image prompts using PixVerse v4.5
PixverseGenerate high quality video clips from text and image prompts using PixVerse v4.5
PixverseGenerate high quality and fast video clips from text and image prompts using PixVerse v4.5 fast
PixverseGenerate fast high quality video clips from text and image prompts using PixVerse v4.5
PixversePixverse Effects
PixverseAdd immersive sound effects and background music to your videos using PixVerse sound effects generation
PixversePixverse Transition
PixverseGenerate high quality video clips with different effects using PixVerse v5
PixverseGenerate high quality video clips from text and image prompts using PixVerse v5.5
PixverseGenerate high quality video clips from text and image prompts using PixVerse v5.5
PixverseGenerate high quality video clips by swapping person, objects and background using Pixverse Swap.
PixversePixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
PixverseUse the latest pixverse v5.6 model to turn your texts and images into amazing videos.
PixverseUse the latest pixverse v5.6 model to turn your texts and images into amazing videos.
PixverseCreate seamless transition between images using PixVerse v4.5
PixversePixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
PixverseCreate seamless transition between images using PixVerse v5
PixverseGenerate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model
PixverseGenerate high quality video clips with different effects using PixVerse v4
PixverseGenerate high quality video clips from text and image prompts using PixVerse v5
PixVerse v3.5Generate high quality video clips from text prompts using PixVerse v3.5
PixVerse v3.5: EffectsGenerate high quality video clips with different effects using PixVerse v3.5
PixVerse v3.5 FastGenerate high quality video clips quickly from text prompts using PixVerse v3.5 Fast
PixVerse v3.5: Image to VideoGenerate high quality video clips from text and image prompts using PixVerse v3.5
PixVerse v3.5: Image to Video FastGenerate high quality video clips from text and image prompts quickly using PixVerse v3.5 Fast
PixVerse v3.5: TransitionCreate seamless transition between images using PixVerse v3.5
PixVerse v4: Image to VideoGenerate high quality video clips from text and image prompts using PixVerse v4
PixVerse v4: Image to Video FastGenerate fast high quality video clips from text and image prompts using PixVerse v4
PixVerse v4: Text to VideoGenerate high quality video clips from text and image prompts using PixVerse v4
PixVerse v4: Text to Video FastGenerate high quality and fast video clips from text and image prompts using PixVerse v4 fast
Pixverse v5 Image to VideoGenerate high quality video clips from text and image prompts using PixVerse v5
RIFEInterpolate videos with RIFE - Real-Time Intermediate Flow Estimation
Sad TalkerLearning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Sad TalkerLearning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Sam 3SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
Sam 3SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
Sana VideoLeverage Sana's ultra-fast processing speed to generate high-quality assets that transform your text prompts into production-ready videos
ScailSCAIL is a character animation model that uses 3D consistent pose representations to animate reference images with coherent motion, supporting complex movements.
Seedance 1.0 LiteSeedance 1.0 Lite
Seedance 1.0 LiteSeedance 1.0 Lite
Seedance 1.0 ProSeedance 1.0 Pro, a high quality video generation model developed by Bytedance.
Seedance 1.0 ProSeedance 1.0 Pro, a high quality video generation model developed by Bytedance.
SeedVR2Upscale your videos using SeedVR2 with temporal consistency!
Segment Anything Model 2SAM 2 is a model for segmenting images and videos in real-time.
Skyreels V1 (Image-to-Video)SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning HunyuanVideo on O(10M) high-quality film and television clips
Sora 2Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
Sora 2Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
Sora 2Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
Sora 2Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
Sora 2Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure
Stable AvatarStable Avatar generates audio-driven video avatars up to five minutes long
Stable Video DiffusionGenerate short video clips from your prompts using SVD v1.1
Stable Video Diffusion TurboGenerate short video clips from your images using SVD v1.1 at Lightning Speed
Stable Video Diffusion TurboGenerate short video clips from your images using SVD v1.1 at Lightning Speed
Steady DancerCreate smooth, realistic videos from a single photo while keeping the original appearance intact—precise motion control without losing identity or visual quality.
Sync LipsyncGenerate high-quality realistic lipsync animations from audio while preserving unique details like natural teeth and unique facial features using the state-of-the-art Sync Lipsync 2 Pro model.
Sync Lipsync 2.0Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model
Sync React-1Use React-1 from SyncLabs to refine human emotions and do realistic lip-sync without losing details!
sync.so -- lipsync 1.9.0-betaGenerate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.
T2V Turbo - Video CrafterGenerate short video clips from your prompts
ThinkSoundGenerate realistic audio from a video with an optional text prompt
ThinkSoundGenerate realistic audio for a video with an optional text prompt and combine
Topaz Video UpscaleProfessional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.
TransPixar V1Transform text into stunning videos with TransPixar - an AI model that generates both RGB footage and alpha channels, enabling seamless compositing and creative video effects.
V2.6Wan 2.6 reference-to-video flash model.
V2.6Wan 2.6 image-to-video flash model.
VaceVace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Veo 2Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.
Veo 2 (Image to Video)Veo 2 creates videos from images with realistic motion and very high quality output.
Veo 3Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!
Veo3Veo 3 is the latest state-of-the art video generation model from Google DeepMind
Veo 3.1Extend Veo-Created Videos up to 30 seconds
Veo 3.1Generate Videos from images using Google's Veo 3.1
Veo 3.1Generate videos from a first and last framed using Google's Veo 3.1
Veo 3.1Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!
Veo 3.1Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind
Veo 3.1 FastFaster and more cost effective version of Google's Veo 3.1!
Veo 3.1 FastGenerate videos from a first/last frame using Google's Veo 3.1 Fast
Veo 3.1 FastExtend Veo-Created Videos up to 30 seconds
Veo 3.1 FastGenerate videos from your image prompts using Veo 3.1 fast.
Veo 3 FastFaster and more cost effective version of Google's Veo 3!
Veo 3 Fast [Image to Video]Now with a 50% price drop. Generate videos from your image prompts using Veo 3 fast.
VideoUpscale videos up to 8K output resolution. Trained on fully licensed and commercially safe data.
VideoAutomatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.
VideoA high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
VideoA high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency
VideoA high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
Video As PromptA model for unified semantic control in video generation. It animates a static reference image using the motion and semantics from a reference video as a prompt.
Video Background RemovalRemove background from any video with people and objects. No green screen needed.
Video Background RemovalRemove background from videos filmed using chromakey, with automatic green spill suppression for clean, professional edges.
Video Background RemovalRemove background from any video with people and objects. No green screen needed.
Video Sound Effects GeneratorAdd sound effects to your videos
Video UpscalerThe video upscaler endpoint uses RealESRGAN on each frame of the input video to upscale the video to a higher resolution.
ViduUse the latest Vidu Q2 Pro models which much more better quality and control on your videos.
ViduVidu's latest Q3 pro models
ViduVidu's latest Q3 pro models.
ViduGenerate video clips from your multiple image references using Vidu Q1
ViduUse the latest Vidu Q2 models which much more better quality and control on your videos.
ViduVidu's Q3 Turbo Model
ViduUse the latest Vidu Q2 models which much more better quality and control on your videos.
ViduUse the latest Vidu Q2 models which much more better quality and control on your videos.
ViduUse the latest Vidu Q2 models which much more better quality and control on your videos.
ViduVidu's Q3 Turbo Model.
Vidu Image to VideoVidu Q1 Image to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity from a single image
Vidu Image to VideoVidu Image to Video generates high-quality videos with exceptional visual quality and motion diversity from a single image
Vidu Reference to VideoVidu Reference to Video creates videos by using a reference images and combining them with a prompt.
Vidu Start End to VideoVidu Q1 Start-End to Video generates smooth transition 1080p videos between specified start and end images.
Vidu Start-End to VideoVidu Start-End to Video generates smooth transition videos between specified start and end images.
Vidu Template to VideoVidu Template to Video lets you create different effects by applying motion templates to your images.
Vidu Text to VideoVidu Q1 Text to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity
WanWan 2.2's 5B distill model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
WanWan-2.2 turbo text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.
WanWan-2.2 video-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and source videos.
WanWan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
WanWan-2.2 Turbo image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.
Wan-2.1 First-Last-Frame-to-VideoWan-2.1 flf2v generates dynamic videos by intelligently bridging a given first frame to a desired end frame through smooth, coherent motion sequences.
Wan-2.1 Image-to-VideoWan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images
Wan-2.1 Image-to-Video with LoRAsAdd custom LoRAs to Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images
Wan-2.1 Pro Image-to-VideoWan-2.1 Pro is a premium image-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from images
Wan-2.1 Pro Text-to-VideoWan-2.1 Pro is a premium text-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from text prompts
Wan-2.1 Text-to-VideoWan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts
Wan-2.1 Text-to-Video with LoRAsAdd custom LoRAs to Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from images
Wan 2.1 VACE Long ReframeReframe entire videos scene-by-scene using Wan VACE 2.1
Wan-2.2 Animate MoveWan-Animate is a video model that generates high-fidelity character videos by replicating the expressions and movements of characters from reference videos.
Wan-2.2 Animate ReplaceWan-Animate Replace is a model that can integrate animated characters into reference videos, replacing the original character while preserving the scene’s lighting and color tone for seamless environmental integration.
Wan 2.2 Fun ControlGenerate pose or depth controlled video using Alibaba-PAI's Wan 2.2 Fun
Wan-2.2 Speech-to-Video 14BWan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications
Wan-2.2 Text-to-Video A14BWan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.
Wan-2.2 Text-to-Video A14B with LoRAsWan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. This endpoint supports LoRAs made for Wan 2.2.
Wan 2.2 VACE Fun A14BVACE Fun for Wan 2.2 A14B from Alibaba-PAI
Wan 2.2 VACE Fun A14BVACE Fun for Wan 2.2 A14B from Alibaba-PAI
Wan 2.2 VACE Fun A14BVACE Fun for Wan 2.2 A14B from Alibaba-PAI
Wan 2.2 VACE Fun A14BVACE Fun for Wan 2.2 A14B from Alibaba-PAI
Wan 2.2 VACE Fun A14BVACE Fun for Wan 2.2 A14B from Alibaba-PAI
Wan 2.5 Image to VideoWan 2.5 image-to-video model.
Wan 2.5 Text to VideoWan 2.5 text-to-video model.
Wan AlphaGenerate videos with transparent backgrounds
Wan AtiWAN-ATI is a controllable video generation model that uses trajectory instructions to guide object, local, and camera motion, enabling precise and flexible image-to-video creation.
Wan EffectsWan Effects generates high-quality videos with popular effects from images
Wan MotionWan Motion is a streamlined character animation model that transfers motion from a driving video onto a reference character image. Based on Wan-Animate which preserves the original character's proportions, Simple uses pose retargeting to adapt the driving video's skeleton to match the reference character's body shape, producing more natural results when the two have different builds. It outputs at 720p with optimized defaults for fast, high-quality generation — just provide a video, an image, and an optional prompt.
Wan Move [480p]Use Wan-Move to generate videos with controlled the motion using trajectories
Wan v2.2 5BWan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
Wan v2.2 5BWan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
Wan v2.2 A14Bfal-ai/wan/v2.2-A14B/image-to-video
Wan v2.2 A14B Image-to-Video A14B with LoRAsWan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2
Wan v2.6 Image to VideoWan 2.6 image-to-video model.
Wan v2.6 Reference to VideoWan 2.6 reference-to-video model.
Wan v2.6 Text to VideoWan 2.6 text-to-video model.
Wan Vace 1 3bVace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Wan VACE 14BVACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Wan VACE 14BVACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Wan VACE 14BVACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Wan VACE 14BVACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Wan VACE 14BVACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Wan VACE 14BVACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
Wan VACE Video EditEdit videos using plain language and Wan VACE
Wan Vision EnhancerWan Vision Enhancer for magnify/enhance video with high fidelity and creativity.
Workflow UtilitiesFFMPEG Utility for Trim Video
Workflow UtilitiesAdd automatic subtitles to videos
Workflow UtilitiesFFMPEG Utility to Reverse Videos
Workflow UtilitiesFFMPEG Utilities to Scale Videos
Workflow UtilitiesFFMPEG Utility for Blending Videos
SEO текст
Добро пожаловать на ИИ-комбайн!
Тут мы не только пашем нейросеть, но и молотим тексты, жмём инсайты и вяжем интеграции в снопы API.
Наш многофункциональный ИИ-комбайн заменяет:
— копирайтера,
— маркетолога,
— фронтендера,
— психотерапевта (ну почти).
Вы можете:
— писать SEO-тексты, не зевая,
— генерировать код, не матерясь,
— создавать картинки, не рисуя,
— управлять ассистентами, не нанимая.
Один комбайн — тысячи задач.
Никакой словаря, только чистый синтез синтаксиса и смысла.
