Globally Tuned Cascade Pose Regression via Back Propagation with Application in 2D Face Pose Estimation and Heart Segmentation in 3D CT Images

Recently, a successful pose estimation algorithm, called Cascade Pose Regression (CPR), was proposed in the literature. Trained over Pose Index Feature, CPR is a regressor ensemble that is similar to Boosting. In this paper we show how CPR can be represented as a Neural Network. Specifically, we adopt a Graph Transformer Network (GTN) representation and accordingly train CPR with Back Propagation (BP) that permits globally tuning. In contrast, previous CPR literature only took a layer wise training without any post fine tuning. We empirically show that global training with BP outperforms layer-wise (pre-)training. Our CPR-GTN adopts a Multi Layer Percetron as the regressor, which utilized sparse connection to learn local image feature representation. We tested the proposed CPR-GTN on 2D face pose estimation problem as in previous CPR literature. Besides, we also investigated the possibility of extending CPR-GTN to 3D pose estimation by doing experiments using 3D Computed Tomography dataset for heart segmentation.

[1]  Rafael C. González,et al.  Digital image processing using MATLAB , 2006 .

[2]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[3]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Olivier Ecabert,et al.  Automatic Model-Based Segmentation of the Heart in CT Images , 2008, IEEE Transactions on Medical Imaging.

[5]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[7]  Shiguang Shan,et al.  Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[8]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[9]  Wenyu Liu,et al.  Deep Regression for Face Alignment , 2014, ArXiv.

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  Stefanos Zafeiriou,et al.  A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Pietro Perona,et al.  Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Jian Sun,et al.  Joint Cascade Face Detection and Alignment , 2014, ECCV.

[15]  Pietro Perona,et al.  Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Dorin Comaniciu,et al.  Four-Chamber Heart Modeling and Automatic Segmentation for 3-D Cardiac CT Volumes Using Marginal Space Learning and Steerable Features , 2008, IEEE Transactions on Medical Imaging.

[17]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[19]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Shiguang Shan,et al.  Topic-Aware Deep Auto-Encoders (TDA) for Face Alignment , 2014, ACCV.

[21]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  D. Geman,et al.  Stationary Features and Cat Detection , 2008 .

[25]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[27]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[31]  Zhe L. Lin,et al.  Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[33]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..