Latest Kling model with text-to-video and image-to-video capabilities. Supports native audio/voiceover generation. 5s and 10s durations with multiple aspect ratios.
Added Dec 3, 2025
Approx. Price
$0.350 per video
Model Type
both
Preview Examples
5
Generation controls available for this model.
Resolution/Aspect Options
3
Default Duration
5
2 duration options
Tunable Settings
3
Aspect Ratio (T2V)
Default
16:9
Options (3)
Landscape (16:9), Portrait (9:16), Square (1:1)
Applies to text-to-video
Audio
Default
Yes
Generate audio/voiceover for the video
Duration
Default
5
Options (2)
5 seconds, 10 seconds
5 or 10 seconds
Human preference benchmarks sourced from Artificial Analysis.
Text to Video
#31 / 83
ELO
1189.0
Appearances
4,554
95% CI
-8/8
Image to Video
#28 / 76
ELO
1245.0
Appearances
4,396
95% CI
-10/10
Release Date 2025-12 · Matched as Kling 2.6 Pro (December)
Artificial Analysis APIScene: On a beach with sunlight spilling across golden sand, waves crashing onto the shore and forming white foam. Subject: A young American male wearing a backwards baseball cap, holding a camera for a selfie, smiling naturally. Audio: The young American male with a bright, sunny voice speaks to the camera: "The weather is amazing today! All my worries feel totally gone. I've been needing a day like this—sun, breeze, just the sound of the waves." Background includes layered ocean wave sounds, filmed in a close-up vlog-style shot.
Image-to-Video | 10s | Audio enabled
Scene: A dimly lit casino VIP room, with a green felt poker table at the center and a haze of drifting cigarette smoke. Subject: A suited man leans forward with his elbow on the table and says: "Three rounds to decide. Win, and all the chips are yours. Lose, and tell me the real reason you're getting close to him." Across from him, a curly-haired woman gently slides her fingertips along the edge of the table, her red lips curling slightly as she replies: "I don't care about the chips." Atmosphere is tense, cinematic, with dramatic low-key lighting and noir-style mood.
Image-to-Video | 10s | Audio enabled
Scene: No visible people. Only a white robotic vacuum cleaner is shown along with its cleaning path on the floor. Audio: A soft female narrator speaks, accompanied by gentle vacuum-cleaning sound effects: "Still struggling with dust in the corners? This robotic vacuum cleans right up against the edges with no gaps, making your life easier and worry-free!" Camera: Follows the robot's cleaning path smoothly as it moves across the floor.
Image-to-Video | 10s | Audio enabled
Scene: A tabletop setup featuring ASMR trigger props such as a crystal glass, wooden block, and makeup brushes. Audio: Soft brushing sounds as a makeup brush gently sweeps across the crystal glass and wooden block. Camera: Focuses closely on the props and the precise hand movements, highlighting textures and subtle details. Atmosphere: Calm, soothing, and sensory-focused.
Image-to-Video | 5s | Audio enabled
On a rainy night street with neon lights flashing, the streetlights illuminate the wet ground as raindrops fall. A cellist stands under the streetlight, with raindrops dripping from their hair, playing the cello. The slow and affectionate solo melody of the cello with a cold color tone.
Image-to-Video | 10s | Audio enabled