How does the image-to-video flow work on this site?

Upload one jpg, png, or webp image. Type a short prompt focused on motion and camera. Pick an aspect ratio and clip length. Hit Generate. The clip comes back in roughly 30 to 60 seconds as an MP4 with ambient audio muxed in.

What image formats and resolutions are accepted?

jpg, png, jpeg, and webp at 256p, 480p, or 720p. RGB color only; grayscale is not supported. Use the highest resolution your image is available in for the best motion stability.

What output resolution and length does this tool support?

Native output is up to 720p. Clip length in the consumer flow is 1 to 7 seconds. The underlying model supports up to 400 frames (about 16 seconds at 24 fps); the longer range is available on paid plans.

What aspect ratios are supported?

Five: 16:9, 4:3, 1:1, 3:4, and 9:16. Pick 16:9 for landscape and YouTube. Pick 9:16 for TikTok, Reels, and Shorts. Pick 1:1 for Instagram feed posts.

Is Cosmos 3 Super Image to Video free?

The first generation is free without an account. Saving, downloading, and history features open with a free account. Paid plans add 4K upscaling, longer clips, faster queues, and more generations per month.

Can the output be used commercially?

Yes. The OpenMDW 1.1 license permits commercial use. Clips made on this site can ship in paid ads, sold product videos, client work, short films, and content monetized on YouTube, Instagram, and TikTok.

How is Cosmos 3 Super Image to Video different from Runway, Kling, or Pika?

It is a model fine-tuned specifically for image-to-video, with open weights and a commercial-use license that does not depend on a subscription tier. Runway, Kling, and Pika are general video models with image input as one mode, all closed-weight, with tier-gated commercial rights.

What kind of motion does the model handle well, and where does it struggle?

It handles ambient and continuous motion well: wind, fabric, light shifts, slow camera moves, subtle character motion. It struggles with extreme physics because the model card notes contact dynamics are only approximated. Use the negative prompt to steer around artifacts.

Do I need an NVIDIA GPU to use this?

No. All rendering runs in the cloud. You only need a browser to use Cosmos 3 Super Image to Video here.

Cosmos 3 Super Image to Video
animate any photo in seconds

Drop in one image. Describe the motion you want. Get a short video back in under a minute. The engine is NVIDIA's open Cosmos 3 Super, the 64B world model that handles motion better than the usual AI guess.

What is Cosmos 3 Super Image to Video?

Cosmos 3 Super Image to Video is an online tool that turns a single still image into a short video clip. You give it a photo plus a short description of motion. It does the rest.

Under the hood, this site runs the official Cosmos3-Super-Image2Video model, published by NVIDIA on Hugging Face on June 1, 2026. The model is a dedicated 64-billion-parameter variant of Cosmos 3 Super, fine-tuned specifically for the image-to-video task. It is not a general video model with image input bolted on. It is a model whose entire fine-tune was built around "given one image and instructions, predict a coherent video."

The model is released under the OpenMDW 1.1 open license. Per the NVIDIA model card, output is ready for both commercial and non-commercial use, so what you make on this site can ship in paid ads, product pages, and client work.

What it accepts and produces:

Input: one image (jpg, png, webp) at 256p, 480p, or 720p, plus a text prompt up to 4,096 tokens, plus an optional negative prompt.
Output: an MP4 from 5 to 400 frames long (default 189 frames, about 7.9 seconds at 24 fps), with ambient audio muxed in at 48 kHz stereo.
Aspect ratios: 16:9, 4:3, 1:1, 3:4, and 9:16, matching how your video will actually ship (landscape, square, vertical).

What this image-to-video tool can do

Six things the Cosmos 3 Super Image to Video model does that show up in every clip rendered here. Each one is documented in the official NVIDIA model card.

Subject and composition stay put

Drop in a photo. The model keeps the subject, color palette, label text, and framing intact while motion takes over. Faces stay coherent. Logos do not warp. Product packaging stays readable. This is the difference between photo to video AI and video that just happens to start where the image ended.

Motion you describe in plain English

Prompts focus on motion, camera, lighting, and scene change. You do not re-describe what is in the photo. Example: "slow dolly forward, soft golden hour key light, breeze through the leaves." The model already sees the still.

Five aspect ratios for where the video actually ships

Pick 16:9 for YouTube and landscape ads. Pick 9:16 for TikTok, Reels, and YouTube Shorts. Pick 1:1 for Instagram feed posts. 4:3 and 3:4 are also available. Match the input image ratio to the selected output to avoid cropping.

Length built for short-form delivery

Clip length runs from 1 to 7 seconds in the consumer flow. The underlying model supports up to 400 frames or roughly 16 seconds at 24 fps; the longer range is available on paid plans.

Ambient audio in the same render pass

Cosmos 3 is an omnimodal model. Audio comes out of the same render, muxed into the MP4 at 48 kHz stereo AAC. No second tool, no sync work.

Honest about motion limits

NVIDIA notes that the model lacks an explicit physics simulator and that contact dynamics and physical laws are only approximated. Cosmos 3 Super Image to Video looks more grounded than most AI video tools, but extreme physics can still produce artifacts. Use the negative prompt to steer around them.

Examples

See the Cosmos 3 Super AI Video Generator in action

Six clips, each generated on this site. No retouching, no splice cuts. Click any tile to see the exact prompt, the seed, and a one-click remix button.

Cosmos 3 Super

Shopping Cart POV

"Ultra-realistic cinematic shot, wide-angle lens, dynamic motion blur, playful and energetic tone, natural daylight. Single continuous POV shot - the camera is mounted at the front of a moving shopping cart, looking inward. A young woman sits inside the cart, laughing freely, legs up, arms raised. The cart is pushed quickly through an empty parking lot. Background streaks with strong motion blur. No cuts - continuous movement, spontaneous youthful cinematic moment."

Cosmos 3 Super

Nature Documentary

"Serious faux 80s nature-documentary dating interview montage of animals in restrained retro outfits, each in documentary-style interview glimpses with authentic animal noises only. After the montage, stay on a poodle for a short interview moment. Off-camera interviewer with refined English voice. Photoreal, cinematic, sincere and observational."

Cosmos 3 Super

Urban Skateboard Chase

"Ultra-realistic cinematic street shot, handheld tracking, natural daylight, cool urban tones, subtle film grain. Single continuous shot - camera follows closely from behind a skateboarder riding fast down a city street. Low framing on board and pushing foot. Red shoulder bag swings with each push. Asphalt rushes beneath - no cuts, immersive fast urban realism."

Cosmos 3 Super

Bowling Alley Strike

"1980s New York City, gritty urban atmosphere, cinematic film grain. Street-level tracking shot, a man in a dark suit walks along a busy sidewalk, then enters a dimly lit bowling alley. Warm neon interior. He grabs a ball and throws - camera drops low tracking the rolling ball in slow motion. Perfect strike, pins exploding. Retro cinematic, smooth continuous motion."

Cosmos 3 Super

Lunar Orbit Flag

"POV from inside a spacecraft in high orbit around the Moon. Handheld, human micro-jitters. At second 4: fast sharp digital zoom-in to an aged American flag on the lunar surface - faded, dusty, static, frozen folds. Early 2000s camcorder texture, heavy grain, harsh direct sunlight. Vertical 9:16, raw accidental footage feel."

Cosmos 3 Super

Transformer Chase

"A narrow rural dirt road in Mediterranean vegetation. A parked Ford transforms into a heavy industrial robot - grounded metal physics, no morphing geometry. Two men panic and run as the Transformer smashes wall and vegetation, chasing with massive steps. Cinematic dramatic lighting, 50mm lens, Kodak film look, realistic dust and debris."

How to use Cosmos 3 Super Image to Video

Three steps in the browser. No install, no GPU, no account needed for the first generation. The whole flow takes about a minute.

Upload one image

Drag and drop a jpg, png, or webp file into the upload area, or click to browse. The model accepts 256p, 480p, and 720p input. Use a clear, well-lit, in-focus image: the cleaner the input, the better the motion stability. RGB only; grayscale is not supported.

Cosmos 3 Super Image to Video step 2: describe motion prompt

Describe what should change

Type a short prompt focused on motion, camera, and scene change. Skip describing what is already in the photo because the model already sees it. Good: slow dolly forward, breeze in the leaves, warm afternoon light. Use the negative prompt field to steer away from artifacts you do not want.

Cosmos 3 Super Image to Video step 3: generate and refine video

Pick aspect ratio and length, then generate

Pick the aspect ratio that matches where the video is going: 16:9 for landscape, 9:16 for vertical social, 1:1 for square. Pick a length from 1 to 7 seconds. Match the input image aspect ratio to the output ratio to avoid cropping. Hit Generate.

Upload one image

Describe what should change

Pick aspect ratio and length, then generate

What you get at the end: a downloadable MP4 you can drop into an editor, post to social, or hand off to a client.

Who uses Cosmos 3 Super Image to Video

Six audiences pushed the most generations through the image-to-video flow in week one. Each section names the photo type and the motion that converts best.

For e-commerce sellers

Turn one studio product photo into a rotating clip, a lifestyle shot, or a close-up reveal. The model keeps product color, label text, and proportions intact while adding camera motion and ambient light.

For social media creators

Animate a static feed post into a Reel or TikTok in seconds. The 9:16 ratio plus 1-to-7-second range fits Reels, TikTok, and YouTube Shorts. Best prompts focus on subtle motion.

For ad and marketing teams

Spin up image-to-video variants from a single hero shot. Test which motion treatment lifts CTR before committing to production. OpenMDW license covers commercial use.

For real estate and architecture

Animate a single interior or exterior photo into a slow camera move through the space. The model preserves room geometry and proportions while adding light shifts.

For portrait photographers

Add subtle motion to a portrait: a blink, a slight head turn, a breath. The model keeps the face coherent over short clips. Avoid extreme expression changes.

For travel and lifestyle content

Bring a landscape photo or food still life to motion. Wind through grass, waves rolling on a beach, steam rising from a cup. This is where the Cosmos 3 Image to Video model looks most natural.

How Cosmos 3 Super Image to Video compares to other AI tools

Honest comparison against the image-to-video tools people use in mid-2026. Each tool below leads on a different dimension. Pick by the photo type and the use, not the brand.

Capability	This site (Cosmos 3 Super Image to Video)	Runway Gen-4.5	Kling 3.0	Pika 2.5
Image-to-video specialization	Dedicated 64B model fine-tuned only for image-to-video	General video model with image input	General video model with image input	General video model with image input
Open model weights	Yes (OpenMDW 1.1)	No	No	No
Commercial use of output	Yes, per model card	Tier-limited	Tier-limited	Tier-limited
Strongest at	Motion grounded in the input image; product/landscape stills	Directed camera moves and brand-controlled editor workflow	Human motion, multilingual lip sync	First/last frame control (Pikaframes)
Native aspect ratio range	16:9, 4:3, 1:1, 3:4, 9:16	16:9, 9:16 (4K on Gen-4.5)	16:9, 9:16, native 4K	16:9, 9:16, 1:1
Disclosed training data scale	1.3B data points, 393 datasets	Undisclosed	Undisclosed	Undisclosed

When to pick what. Pick this site when you want an image-to-video tool whose model was built specifically for image-to-video, with open weights and commercial-safe license. Pick Runway Gen-4.5 if you need brand-controlled editor workflows. Pick Kling 3.0 for realistic human motion and lip sync. Pick Pika 2.5 if you want first/last frame control for transitions.

Comparison reflects public documentation as of June 2026. Runway, Kling, and Pika are trademarks of their respective owners. This site is not affiliated with Runway, Kuaishou, or Pika Labs.

Why pick Cosmos 3 Super Image to Video

Four reasons creators stay after their first generation. Each one is something the closed image-to-video competitors above cannot fully match in mid-2026.

Built for image-to-video, not retrofitted for it.

Cosmos 3 Super Image to Video is a standalone 64B model fine-tuned specifically on the image-to-video task. The specialization shows up in how well the input image's subject, color, and composition stay locked.

Open license you can read.

The OpenMDW 1.1 license is public. The model card explicitly states the output is ready for commercial and non-commercial use. Closed competitors gate commercial rights behind subscription tiers.

Honest about what it can and cannot do.

The NVIDIA model card lists the model limits in plain language: no explicit physics simulator, approximated contact dynamics, possible artifacts on extreme motion. You can read what to expect before you generate.

Leads the open-weights image-to-video leaderboard.

As of May 28, 2026, Cosmos 3 Super Image to Video ranks first on Artificial Analysis's open-source image-to-video leaderboard. That is independent benchmark data, not a marketing claim.

Cosmos 3 Super Image to Video FAQ

Ten questions people ask most often. Each answer is written to be useful on its own, in case the section is surfaced through Google or an AI search engine.

Cosmos 3 Super Image to Video is a standalone NVIDIA model that turns one input image plus a text prompt into a short MP4 video with audio. It is a 64-billion-parameter variant of Cosmos 3 Super, fine-tuned only for the image-to-video task, and released on June 1, 2026 under the OpenMDW 1.1 open license.

Make your first image-to-video with Cosmos 3 Super

Free first generation. No install. No GPU. The same Cosmos3-Super-Image2Video weights NVIDIA published on Hugging Face, running in your browser in under a minute.

Upload one image free See pricing

No credit card for the first generation. OpenMDW 1.1 license on output.

Cosmos 3 Super Image to Videoanimate any photo in seconds

What is Cosmos 3 Super Image to Video?

What it accepts and produces:

What this image-to-video tool can do

Subject and composition stay put

Motion you describe in plain English

Five aspect ratios for where the video actually ships

Length built for short-form delivery

Ambient audio in the same render pass

Honest about motion limits

See the Cosmos 3 Super AI Video Generator in action

Shopping Cart POV

Nature Documentary

Urban Skateboard Chase

Bowling Alley Strike

Lunar Orbit Flag

Transformer Chase

How to use Cosmos 3 Super Image to Video

Upload one image

Describe what should change

Pick aspect ratio and length, then generate

Upload one image

Describe what should change

Pick aspect ratio and length, then generate

Who uses Cosmos 3 Super Image to Video

For e-commerce sellers

For social media creators

For ad and marketing teams

For real estate and architecture

For portrait photographers

For travel and lifestyle content

How Cosmos 3 Super Image to Video compares to other AI tools

Why pick Cosmos 3 Super Image to Video

Built for image-to-video, not retrofitted for it.

Open license you can read.

Honest about what it can and cannot do.

Leads the open-weights image-to-video leaderboard.

Cosmos 3 Super Image to Video FAQ

What is Cosmos 3 Super Image to Video?

How does the image-to-video flow work on this site?

What image formats and resolutions are accepted?

What output resolution and length does this tool support?

What aspect ratios are supported?

Is Cosmos 3 Super Image to Video free?

Can the output be used commercially?

How is Cosmos 3 Super Image to Video different from Runway, Kling, or Pika?

What kind of motion does the model handle well, and where does it struggle?

Do I need an NVIDIA GPU to use this?

Make your first image-to-video with Cosmos 3 Super

Cosmos 3 Super Image to Video
animate any photo in seconds