Physics-model-based 3D facial image reconstruction from frontal images using optical flow

image synthesis is one of the most realistic approaches to realize lifelike agents in computers. A facial-muscle model 1 is composed of facial tissue elements and muscles. In this model, forces affecting facial tissue elements are calculated by contraction of each muscle, so the combination of each muscle parameter determines a specific facial expression. Then each muscle parameter is specified in a trial-and-error procedure comparing the sample photograph and generated image using our Muscle-Editor to generate a specific face image. In this sketch, we propose a strategy for automatic estimation of facial muscle parameters from optical flow using a neural network. This corresponds to 3D facial-motion tracking from a 2D image under the physics-model-based constraint. This technique is also 3D-motion estimation from 2D tracking in a captured image under the constraint of the physics-based face model. We use optical flow of the facial image to measure transformation of the face when an expression appears. However, all of the flows are not used independently. They are compressed in the window depending on the position of each muscle. Optical flow is calculated by a block-matching method, and the block size is 8 by 8 pixels in an image size of 720 by 486 pixels. Then we can determine the difference between any specific expression and a neutral with 96 dimensional vectors including 48 windows with x-y directions. The learning pattern is composed of a data pair: a muscle-parameter vector for the output layer and an optical-flow vector for the input layer. Neural networks were trained using back propagation. They include 55 images (i.e., five keyframes for transition from neutral to each of six basic expressions and mouth shapes of five vowels. Their learning pattern is not given at once but gradually increased to escape the local minimum. To confirm whether the learning process of the neural network is successfully completed, we input the learning data into the input layer of the neural network and resynthesize the facial image using muscle parameters from the output layer of the neural network. Example results are shown in the figures. There are slight differences in fine detail between the original and regenerated images, but subjectively, the overall facial features and expressions are almost the same as the original image. So the mapping rules work well for the training data. Muscle parameters are decided only from the 2D image, but the 3D facial image is well regenerated. …