Transferable Videorealistic Speech Animation

Transferable Videorealistic Speech Animation

Authors

Yao-Jen Chang, Advanced Technology Center, CCL, ITRI
Tony Ezzat, Center for Biological and Computational Learning, MIT

Abstract

Image-based videorealistic speech animation achieves significant visual realism at the cost of the collection of a large 5- to 10-minute video corpus from the specific person to be animated. This requirement hinders its use in broad applications, since a large video corpus for a specific person under a controlled recording setup may not be easily obtained. In this paper, we propose a model transfer and adaptation algorithm which allows for a novel person to be animated using only a small video corpus. The algorithm starts with a multidimensional morphable model (MMM) previously trained from a different speaker with a large corpus, and transfers it to the novel speaker with a much smaller corpus. The algorithm consists of 1) a novel matching-by-synthesis algorithm which semi-automatically selects new MMM prototype images from the new video corpus and 2) a novel gradient descent linear regression algorithm which adapts the MMM phoneme models to the data in the novel video corpus. Encouraging experimental results are presented in which a morphable model trained from a performer with a 10-minute corpus is transferred to a novel person using a 15-second movie clip of him as the adaptation video corpus.

Paper & Demo

Yao-Jen Chang and Tony Ezzat, "Transferable Videorealistic Speech Animation," accepted by 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA'05), Los Angeles, CA, USA, July 29-31, 2005. [PDF]
Demo Video (36.5MB) [AVI]

(Last update: 2005/07/04)

Back to the main page