AniPortrait: AI Audio-Driven Portrait Animation Tool

AniPortrait

3.5 | 15 | 0
Type:
Open Source Projects
Last Updated:
2025/10/03
Description:
AniPortrait is an open-source AI framework for generating photorealistic portrait animations driven by audio or video inputs. It supports self-driven, face reenactment, and audio-driven modes for high-quality video synthesis.
Share:
audio-driven portrait
animation synthesis
face reenactment
pose retargeting
video generation

Overview of AniPortrait

What is AniPortrait?

AniPortrait is an innovative open-source framework designed for audio-driven synthesis of photorealistic portrait animations. Developed by Huawei Wei, Zejun Yang, and Zhisheng Wang from Tencent Games Zhiji and Tencent, this tool leverages advanced AI techniques to create high-quality animated portraits from a single reference image and audio or video inputs. Whether you're animating a static portrait with speech audio or reenacting facial expressions from a source video, AniPortrait delivers lifelike results that capture subtle nuances like lip-sync and head movements. Ideal for content creators, game developers, and researchers in computer vision, it stands out in the realm of AI video generation tools by focusing on portrait-specific animations.

Released on GitHub under the Apache-2.0 license, AniPortrait has garnered over 5,000 stars, reflecting its popularity in the AI community. The project emphasizes accessibility, with pre-trained models, detailed installation guides, and even a Gradio web UI for easy testing.

How Does AniPortrait Work?

At its core, AniPortrait employs a multi-stage pipeline that integrates diffusion models, audio processing, and pose estimation to generate animations. The framework builds on established models like Stable Diffusion V1.5 and wav2vec2 for feature extraction, ensuring robust handling of audio-visual synchronization.

Key Components and Workflow

  • Input Processing: Start with a reference portrait image. For audio-driven mode, audio inputs are processed using wav2vec2-base-960h to extract speech features. In video modes, source videos are converted to pose sequences via keypoint extraction.
  • Pose Generation: The audio2pose model generates head pose sequences (e.g., pose_temp.npy) from audio, enabling control over facial orientations. For face reenactment, a pose retarget strategy maps movements from the source video to the reference image, supporting substantial pose differences.
  • Animation Synthesis: Utilizes denoising UNet, reference UNet, and motion modules to synthesize frames. The pose guider ensures alignment, while optional frame interpolation accelerates inference.
  • Output Refinement: Generates videos at resolutions like 512x512, with options for acceleration using film_net_fp16.pt to reduce processing time.

This modular approach allows for self-driven animations (using predefined poses), face reenactment (transferring expressions), and fully audio-driven synthesis, making it versatile for various AI portrait animation scenarios.

Core Features of AniPortrait

AniPortrait packs a range of powerful features tailored for realistic portrait animation:

  • Audio-Driven Portrait Animation: Syncs lip movements and expressions to audio inputs, perfect for dubbing or virtual avatars.
  • Face Reenactment: Transfers facial performances from a source video to a target portrait, ideal for deepfake-like ethical applications in media.
  • Pose Control and Retargeting: Updated strategies handle diverse head poses, including generation of custom pose files for precise control.
  • High-Resolution Output: Produces photorealistic videos with support for longer sequences (up to 300 frames or more).
  • Acceleration Options: Frame interpolation and FP16 models speed up inference without sacrificing quality.
  • Gradio Web UI: A user-friendly interface for quick demos, also hosted on Hugging Face Spaces for online access.
  • Pre-Trained Models: Includes weights for audio2mesh, audio2pose, and diffusion components, downloadable from sources like Wisemodel.

These features make AniPortrait a go-to tool for AI-driven video synthesis, surpassing basic tools by focusing on portrait fidelity and audio-visual coherence.

Installation and Setup

Getting started is straightforward for users with Python >=3.10 and CUDA 11.7:

  1. Clone the repository: git clone https://github.com/Zejun-Yang/AniPortrait.
  2. Install dependencies: pip install -r requirements.txt.
  3. Download pre-trained weights to ./pretrained_weights/, including Stable Diffusion components, wav2vec2, and custom models like denoising_unet.pth and audio2pose.pt.
  4. Organize files as per the directory structure in the README.

For training, prepare datasets like VFHQ or CelebV-HQ by extracting keypoints and running preprocessing scripts. Training occurs in two stages using Accelerate for distributed processing.

How to Use AniPortrait?

Inference Modes

AniPortrait supports three primary modes via command-line scripts:

  • Self-Driven Animation:

    python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 -acc
    

    Customize with reference images or pose videos. Convert videos to poses using python -m scripts.vid2pose --video_path input.mp4.

  • Face Reenactment:

    python -m scripts.vid2vid --config ./configs/prompts/animation_facereenac.yaml -W 512 -H 512 -acc
    

    Edit the YAML to include source videos and references.

  • Audio-Driven Synthesis:

    python -m scripts.audio2vid --config ./configs/prompts/animation_audio.yaml -W 512 -H 512 -acc
    

    Add audios and images to the config. Enable audio2pose by removing pose_temp for automatic pose generation.

For head pose control, generate reference poses with python -m scripts.generate_ref_pose.

Web Demo

Launch the Gradio UI: python -m scripts.app. Or try the online version on Hugging Face Spaces.

Users can experiment with sample videos like 'cxk.mp4' or 'jijin.mp4' to see audio-sync in action, sourced from platforms like Bilibili.

Training AniPortrait from Scratch

Advanced users can train custom models:

  1. Data Prep: Download datasets, preprocess with python -m scripts.preprocess_dataset, and update JSON paths.
  2. Stage 1: accelerate launch train_stage_1.py --config ./configs/train/stage1.yaml.
  3. Stage 2: Download motion module weights, specify Stage 1 checkpoints, and run accelerate launch train_stage_2.py --config ./configs/train/stage2.yaml.

This process fine-tunes on portrait-specific data, enhancing generalization for AI animation tasks.

Why Choose AniPortrait?

In a crowded field of AI tools for video generation, AniPortrait excels due to its specialized focus on photorealistic portraits. Unlike general-purpose models, it handles audio-lip sync and subtle expressions with precision, reducing artifacts in facial animations. The open-source nature allows customization, and recent updates—like the April 2024 audio2pose release and acceleration modules—keep it cutting-edge. Community acknowledgments to projects like EMO and AnimateAnyone highlight its collaborative roots, ensuring reliable performance.

Practical value includes faster prototyping for virtual influencers, educational videos, or game assets. With arXiv paper availability (eprint 2403.17694), it serves researchers exploring audio-visual synthesis in computer vision.

Who is AniPortrait For?

  • Content Creators and Filmmakers: For quick dubbing or expression transfers in short-form videos.
  • Game Developers at Tencent-like Studios: Integrating animated portraits into interactive media.
  • AI Researchers: Experimenting with diffusion-based animation and pose retargeting.
  • Hobbyists and Educators: Using the web UI to teach AI concepts without heavy setup.

If you're seeking the best way to create audio-driven portrait animations, AniPortrait's balance of quality, speed, and accessibility makes it a top choice.

Potential Applications and Use Cases

  • Virtual Avatars: Animate digital characters with synced speech for social media or metaverses.
  • Educational Tools: Generate talking head videos for lectures or tutorials.
  • Media Production: Ethical face reenactment for historical reenactments or ads.
  • Research Prototyping: Benchmark audio-to-video models in CV papers.

Demonstrations include self-driven clips like 'solo.mp4' and audio examples like 'kara.mp4', showcasing seamless integration.

For troubleshooting, check the 76 open issues on GitHub or contribute via pull requests. Overall, AniPortrait empowers users to push boundaries in AI portrait animation with reliable, high-fidelity results.

Best Alternative Tools to "AniPortrait"

BlitzVideo
No Image Available
10 0

Genie 3 AI
No Image Available
39 0

AiReelGenerator
No Image Available
537 0

Automate faceless video creation with AiReelGenerator. Choose a topic, and AI generates videos for Youtube, TikTok, Instagram, & Facebook daily.

AI video generator
faceless video
ChatArt
No Image Available
251 0

ChatArt is an AI tool offering content creation, image editing, and AI chat features. Powered by GPT-5, Claude Sonnet & DeepSeek, it delivers high-quality content, AI image generation/editing, and plagiarism/grammar detection.

AI content generator
AI image editor
Alle-AI
No Image Available
205 0

Alle-AI is an all-in-one AI platform that combines and compares outputs from ChatGPT, Gemini, Claude, DALL-E 2, Stable Diffusion, and Midjourney for text, image, audio, and video generation.

AI comparison
multi-AI
generative AI
Slides to Videos
No Image Available
Vid.AI
No Image Available
238 0

Vid.AI is an AI-powered video generator that creates faceless videos for YouTube Shorts, TikTok, Instagram Reels, and full-length YouTube videos. Perfect for content creators looking for YouTube automation.

AI video creation
VideoPal.ai
No Image Available
AnimateDiff
No Image Available
Hypergro
No Image Available
26 0

ImagineAPP
No Image Available
418 0

ImagineAPP is an AI-powered platform for creating music videos and other video content from text or images. It supports various AI models like Runway Gen3, Hailuo AI, Kling AI, Luma AI, and Google VEO.

AI video creation
GenXi
No Image Available
231 0

GenXi is an AI-powered platform that generates realistic images and videos from text. Easy to use with DALL App, ScriptToVid Tool, Imagine AI Tool, and AI Logo Maker. Try it free now!

AI image generation
GlobalGPT
No Image Available
333 0

GlobalGPT is an all-in-one AI platform providing access to ChatGPT, GPT-5, Claude, Unikorn (MJ-like), Veo, and 100+ AI tools for writing, research, image & video creation.

AI platform
content creation
SpikeX AI
No Image Available
341 0

Effortlessly turn text into engaging videos with SpikeX AI, the leading text-to-video AI platform for automating YouTube growth in minutes! Create faceless videos for YouTube and social media with just one prompt.

text to video
AI video creation
VO3 AI
No Image Available
17 0