SadTalker

sadtalker

Stylized Audio-Driven Single Image Talking Face Animation

A100 80GB

Fast Inference

REST API

Try in Console API Docs Examples

Model Information

Response Time~44 sec

StatusActive

Version

0.0.1

Updatedabout 2 months ago

Live Demo

Average runtime: ~44 seconds

Input

Configure model parameters

Driven Audio

Upload the driven audio, accepts .wav and .mp4 file

File upload is currently disabled

MP3WAV

preprocess

An enumeration.

size_of_image

An enumeration.

256

facerender

An enumeration.

Source Image

Upload the source image, it can be video.mp4 or picture.png

File upload is currently disabled

JPEGPNGJPGWEBP

Output

View generated results

Result

Preview, share or download your results with a single click.

Cost is calculated based on execution time.The model is charged at $0.002 per second. With a $1 budget, you can run this model approximately 11 times, assuming an average execution time of 44 seconds per run.

API Reference

View Full Documentation

Prerequisites

Create an API Key from the Eachlabs Console
Install the required dependencies for your chosen language (e.g., requests for Python)

API Integration Steps

1. Create a Prediction

Send a POST request to create a new prediction. This will return a prediction ID that you'll use to check the result. The request should include your model inputs and API key.

import requests
import time

API_KEY = "YOUR_API_KEY"  # Replace with your API key
HEADERS = {
    "X-API-Key": API_KEY,
    "Content-Type": "application/json"
}

def create_prediction():
    response = requests.post(
        "https://api.eachlabs.ai/v1/prediction/",
        headers=HEADERS,
        json={
            "model": "sadtalker",
            "version": "0.0.1",
            "input": {
  "facerender": "facevid2vid",
  "pose_style": 0,
  "preprocess": "crop",
  "still_mode": false,
  "driven_audio": "your_file.audio/mp3",
  "source_image": "your_file.image/jpeg",
  "use_enhancer": false,
  "use_eyeblink": false,
  "size_of_image": 256,
  "expression_scale": 1
}
        }
    )
    prediction = response.json()
    
    if prediction["status"] != "success":
        raise Exception(f"Prediction failed: {prediction}")
    
    return prediction["predictionID"]

2. Get Prediction Result

Poll the prediction endpoint with the prediction ID until the result is ready. The API uses long-polling, so you'll need to repeatedly check until you receive a success status.

def get_prediction(prediction_id):
    while True:
        result = requests.get(
            f"https://api.eachlabs.ai/v1/prediction/{prediction_id}",
            headers=HEADERS
        ).json()
        
        if result["status"] == "success":
            return result
        elif result["status"] == "error":
            raise Exception(f"Prediction failed: {result}")
        
        time.sleep(1)  # Wait before polling again

3. Complete Example

Here's a complete example that puts it all together, including error handling and result processing. This shows how to create a prediction and wait for the result in a production environment.

try:
    # Create prediction
    prediction_id = create_prediction()
    print(f"Prediction created: {prediction_id}")
    
    # Get result
    result = get_prediction(prediction_id)
    print(f"Output URL: {result['output']}")
    print(f"Processing time: {result['metrics']['predict_time']}s")
except Exception as e:
    print(f"Error: {e}")

Additional Information

The API uses a two-step process: create prediction and poll for results
Response time: ~44 seconds
Rate limit: 60 requests/minute
Concurrent requests: 10 maximum
Use long-polling to check prediction status until completion

Overview

SadTalker is a model designed to generate lifelike talking face animations from a single reference image and an audio file. It enables the creation of realistic facial movements, including lip-sync, expressions, and eye blinking, to match the provided speech.

Technical Specifications

Facial Motion Capture: Maps speech patterns to natural lip movements and expressions.
Pose Estimation: Allows control over head movement styles.
Eyeblink Control: Enables optional eye blinking for added realism.
Preprocessing Techniques: Provides multiple options to crop, resize, or extract facial regions for optimized processing.
Rendering Options: Different rendering methods influence animation quality and realism.

Key Considerations

Facial Expression Accuracy: The Sadtalker generates the best results when expressions are subtle and natural.
Pose Style Impact: Higher pose values introduce more movement but may cause unnatural shifts if not carefully balanced.
SadTalker Image Resolution: Using 512x512 images results in better detail, but requires more processing.
Eyeblink Control: Disabling this feature may make animations look unnatural, particularly in longer sequences.
Still Mode: Recommended for generating subtle movements rather than exaggerated animations.

Tips & Tricks

Source Image: Use high-quality images with a clear face and neutral expression to achieve smoother animations.
Driven Audio: Ensure audio files are noise-free and have a natural speech rhythm to improve lip-sync accuracy.
Pose Style (pose_style):
- Values between 0-10 create minor head movements.
- 10-25 offers balanced movement for natural expressions.
- 30-45 increases movement but may introduce artifacts.
Expression Scale (expression_scale):
- Keep within 0.8-1.2 for realistic expressions.
- Higher values may exaggerate facial movements unnaturally.
Size of Image (size_of_image):
- 256: Faster processing with lower detail.
- 512: Higher detail but requires more computation.
Preprocessing (preprocess):
- crop: Focuses only on the face, best for close-ups.
- resize: Adjusts image dimensions while keeping details.
- full: Uses the full image, suitable for upper-body framing.
- extcrop/extfull: Extended versions of crop/full for more background details.
Facerender Method (facerender):
- facevid2vid: Best for smooth and natural transitions.
- pirender: Suitable for artistic or stylized animations.

Capabilities

SadTalker Generates Talking Face Animations: Converts still images into animated faces with synchronized lip movements.
Supports Different Poses and Expressions: Allows customization of facial dynamics.
Works with Various Image Resolutions: Supports 256x256 and 512x512 image sizes.
Realistic Eye Blinking and Facial Movements: Enhances authenticity with adjustable parameters.
Flexible Rendering and Preprocessing Options: Offers different techniques to optimize output.

What can I use for?

Creating Digital Avatars: Generate animated avatars for virtual assistants or social media.
Enhancing Video Content with SadTalker: Add talking animations to static character images.
Educational and Training Materials: Produce realistic facial animations for tutorials or language learning.
Storytelling and Character Animation: Bring still characters to life in animated narratives.
AI-Powered Lip-Sync Applications: Improve synchronization in voice-driven animation projects.

Things to be aware of

Fine-tune Expression Scale: Experiment with values between 0.8-1.2 for natural expressions.
Adjust Pose Style for Different Effects: Low values for subtle movements, high values for dynamic expressions.
Test Different Preprocessing Modes: Compare results using crop, resize, and full to find the best framing.
Use High-Quality Source Images: The better the input, the more realistic the animation.
Enable Eyeblink for More Natural Output: Disabling it may make the animation feel static.

Limitations

The Sadtalker performs best with front-facing images; side angles may cause inconsistencies.
Rapid speech may result in slight desynchronization between lips and audio.
High pose values may lead to unnatural movements if not carefully adjusted.
Some audio accents or tones may affect lip-sync precision.

Output Format: MP4

Related AI Models

SadTalker

Model Information

Input

Output

Result

Prerequisites

API Integration Steps

1. Create a Prediction

2. Get Prediction Result

3. Complete Example

Additional Information

Overview

Technical Specifications

Key Considerations

Tips & Tricks

Capabilities

What can I use for?

Things to be aware of

Limitations

Related AI Models

OmniHuman

Kling v1.6 Image to Video

Hailuo I2V Director

Wan 2.1 I2v 720P