论文信息 - Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings

Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings

Learning automatically the structure of object categories remains an important open problem in computer vision. In this paper, we propose a novel unsupervised approach that can discover and learn landmarks in object categories, thus characterizing their structure. Our approach is based on factorizing image deformations, as induced by a viewpoint change or an object deformation, by learning a deep neural network that detects landmarks consistently with such visual effects. Furthermore, we show that the learned landmarks establish meaningful correspondences between different object instances in a category without having to impose this requirement explicitly. We assess the method qualitatively on a variety of object types, natural and man-made. We also show that our unsupervised landmarks are highly predictive of manually-annotated landmarks in face benchmark datasets, and can be used to regress these with a high degree of accuracy.

[1] Pietro Perona,et al. A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[2] Stefanos Zafeiriou,et al. 300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..

[3] Andrea Vedaldi,et al. AnchorNet: A Weakly Supervised Network to Learn Geometry-Sensitive Features for Semantic Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Yong Jae Lee,et al. FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Donghoon Lee,et al. Face alignment using cascade Gaussian process regression trees , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Andrea Vedaldi,et al. Learning Covariant Feature Detectors , 2016, ECCV Workshops.

[9] Xiaogang Wang,et al. Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Luc Van Gool,et al. Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Stefanos Zafeiriou,et al. Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[13] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Xiaoou Tang,et al. Learning Deep Representation for Face Alignment with Auxiliary Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17] Hossein Mobahi,et al. A Compositional Model for Low-Dimensional Image Set Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Andrea Vedaldi,et al. Fully-trainable deep matching , 2016, BMVC.

[19] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20] Alexei A. Efros,et al. Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22] Shiguang Shan,et al. Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment , 2014, ECCV.

[23] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[24] Honglak Lee,et al. Learning to Align from Scratch , 2012, NIPS.

[25] Gang Hua,et al. Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[26] Xiaoou Tang,et al. Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[27] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[30] Michel F. Valstar,et al. Guided Unsupervised Learning of Mode Specific Models for Facial Point Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[31] Pietro Perona,et al. Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[32] Jian Sun,et al. Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[34] Josef Sivic,et al. Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Feng Zhou,et al. Deep Deformation Network for Object Landmark Localization , 2016, ECCV.

[36] Timothy F. Cootes,et al. Active Appearance Models , 1998, ECCV.

[37] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[38] Yang Yang,et al. Stacked Deformable Part Model with Shape Regression for Object Part Localization , 2014, ECCV.

[39] Ira Kemelmacher-Shlizerman,et al. Collection flow , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Simon Baker,et al. Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[41] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] David J. Kriegman,et al. Localizing Parts of Faces Using a Consensus of Exemplars , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[45] Hanjiang Lai,et al. Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks , 2016, ECCV.

[46] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .

[47] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[48] Fred L. Bookstein,et al. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[49] Peter N. Belhumeur,et al. Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency , 2013, 2013 IEEE International Conference on Computer Vision.

[50] Matti Pietikäinen,et al. Unsupervised learning of overcomplete face descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51] Horst Bischof,et al. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[52] David Cristinacce,et al. Automatic feature localisation with constrained local models , 2008, Pattern Recognit..

[53] David J. Kriegman,et al. Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Weiwei Zhang,et al. Cat Head Detection - How to Effectively Exploit Shape and Texture Features , 2008, ECCV.

[55] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[56] Efstratios Gavves,et al. Self-Supervised Video Representation Learning with Odd-One-Out Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Luc Van Gool,et al. Using a Deformation Field Model for Localizing Faces and Facial Points under Weak Supervision , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[59] Saurabh Singh,et al. Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization , 2015, BMVC.

[60] Yuning Jiang,et al. Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[61] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[62] Pietro Perona,et al. Cascaded pose regression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[64] Gregory Shakhnarovich,et al. Learning Representations for Automatic Colorization , 2016, ECCV.

[65] Luc Van Gool,et al. Unsupervised face alignment by robust nonrigid mapping , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[66] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[67] David W. Jacobs,et al. WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Pietro Perona,et al. Robust Face Landmark Estimation under Occlusion , 2013, 2013 IEEE International Conference on Computer Vision.

[70] M. K. Fleming,et al. Categorization of faces using unsupervised feature extraction , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[71] Cheng Li,et al. Face alignment by coarse-to-fine shape searching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Kristen Grauman,et al. Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[73] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74] Trevor Darrell,et al. Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[76] Jitendra Malik,et al. Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[77] Vincent Lepetit,et al. LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[78] Jian Sun,et al. Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[79] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[80] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[81] Yuandong Tian,et al. Single Image 3D Interpreter Network , 2016, ECCV.

[82] David J. Kriegman,et al. From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[83] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84] Maja Pantic,et al. Facial point detection using boosted regression and graph models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.