Deep Active Shape Model for Robust Object Fitting

Object recognition and localization is still a very challenging problem, despite recent advances in deep learning (DL) approaches, especially for objects with varying shapes and appearances. Statistical models, such as an Active Shape Model (ASM), rely on a parametric model of the object, allowing an easy incorporation of prior knowledge about shape and appearance in a principled way. To take advantage of these benefits, this paper proposes a new ASM framework that addresses two tasks: <inline-formula> <tex-math notation="LaTeX">$(i)$ </tex-math></inline-formula> comparing the performance of several image features used to extract observations from an input image; and <inline-formula> <tex-math notation="LaTeX">$(ii)$ </tex-math></inline-formula> improving the performance of the model fitting by relying on a probabilistic framework that allows the use of multiple observations and is robust to the presence of outliers. The goal in <inline-formula> <tex-math notation="LaTeX">$(i)$ </tex-math></inline-formula> is to maximize the quality of the observations by exploring a wide set of handcrafted features (HOG, SIFT, and texture templates) and more recent DL-based features. Regarding <inline-formula> <tex-math notation="LaTeX">$(ii)$ </tex-math></inline-formula>, we use the Generalized Expectation-Maximization algorithm to deal with outliers and to extend the fitting process to multiple observations. The proposed framework is evaluated in the context of facial landmark fitting and the segmentation of the endocardium of the left ventricle in cardiac magnetic resonance volumes. We experimentally observe that the proposed approach is robust not only to outliers, but also to adverse initialization conditions and to large search regions (from where the observations are extracted from the image). Furthermore, the results of the proposed combination of the ASM with DL-based features are competitive with more recent DL approaches (<italic>e.g.</italic> FCN <xref ref-type="bibr" rid="ref1">[1]</xref>, U-Net <xref ref-type="bibr" rid="ref2">[2]</xref> and CNN Cascade <xref ref-type="bibr" rid="ref3">[3]</xref>), showing that it is possible to combine the benefits of statistical models and DL into a new deep ASM probabilistic framework.

[1]  Martin Abba Tanner,et al.  Tools for Statistical Inference: Observed Data and Data Augmentation Methods , 1993 .

[2]  John K. Tsotsos,et al.  Efficient and generalizable statistical models of shape and appearance for analysis of cardiac MRI , 2008, Medical Image Anal..

[3]  Timothy F. Cootes,et al.  Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.

[4]  Dorin Comaniciu,et al.  An information fusion framework for robust shape tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Carlos Santiago,et al.  2D Segmentation Using a Robust Active Shape Model With the EM Algorithm , 2015, IEEE Transactions on Image Processing.

[6]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[7]  Yi Yang,et al.  Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[9]  Yan Wang,et al.  A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans , 2016, MICCAI.

[10]  Pietro Perona,et al.  Object detection and segmentation from joint embedding of parts and pixels , 2011, 2011 International Conference on Computer Vision.

[11]  Stefanos Zafeiriou,et al.  HOG active appearance models , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[12]  Milan Sonka,et al.  Segmentation of intravascular ultrasound images: a knowledge-based approach , 1995, IEEE Trans. Medical Imaging.

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Fuji Ren,et al.  Facial expression recognition based on AAM–SIFT and adaptive regional weighting , 2015 .

[16]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Daniel Cremers,et al.  Kernel Density Estimation and Intrinsic Alignment for Shape Priors in Level Set Segmentation , 2006, International Journal of Computer Vision.

[18]  Xavier Bresson,et al.  Fast Global Minimization of the Active Contour/Snake Model , 2007, Journal of Mathematical Imaging and Vision.

[19]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Hans-Peter Meinzer,et al.  Statistical shape models for 3D medical image segmentation: A review , 2009, Medical Image Anal..

[22]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[23]  Edward A. Geiser,et al.  An Effective Algorithm for Extracting Serial Endocardial Borders from 2-Dimensional Echocardiograms , 1984, IEEE Transactions on Biomedical Engineering.

[24]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  Maja Pantic,et al.  Optimization Problems for Fast AAM Fitting in-the-Wild , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Rachid Deriche,et al.  Geodesic Active Regions and Level Set Methods for Supervised Texture Segmentation , 2002, International Journal of Computer Vision.

[28]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[29]  Dorin Comaniciu,et al.  Database-guided segmentation of anatomical structures with complex appearance , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Jacinto C. Nascimento,et al.  Segmenting The Left Ventricle In Cardiac In Cardiac MRI: From Handcrafted To Deep Region Based Descriptors , 2019, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

[32]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[33]  Fred Nicolls,et al.  Active shape models with SIFT descriptors and MARS , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[34]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[35]  Carlos Santiago,et al.  Robust Feature Descriptors for Object Segmentation Using Active Shape Models , 2018, ACIVS.

[36]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[37]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[38]  Guang-Zhong Yang,et al.  Outlier Detection and Handling for Robust 3-D Active Shape Models Search , 2007, IEEE Transactions on Medical Imaging.

[39]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[40]  Jorge S. Marques,et al.  Robust Shape Tracking With Multiple Models in Ultrasound Images , 2008, IEEE Transactions on Image Processing.

[41]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[42]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .