Speech-driven facial motion synthesis is a well explored research topic. However, little has been done to model expressive visual behavior during speech. We address this issue using a machine learning approach that relies on a database of speech related high-fidelity facial motions. From this training set, we derive a generative model of expressive facial motion that incorporates emotion control while maintaining accurate lip-synching. The emotional content of the input speech can be manually specified by the user or automatically extracted from the audio signal using a Support Vector Machine classifier.
|Yong Cao||Computer Science||College of Engineering|