Computer Vision – ECCV 2014

Accurate face alignment is a vital prerequisite step for most face perception tasks such as face recognition, facial expression analysis and non-realistic face re-rendering. It can be formulated as the nonlinear inference of the facial landmarks from the detected face region. Deep network seems a good choice to model the nonlinearity, but it is nontrivial to apply it directly. In this paper, instead of a straightforward application of deep network, we propose a Coarse-to-Fine Auto-encoder Networks (CFAN) approach, which cascades a few successive Stacked Auto-encoder Networks (SANs). Specifically, the first SAN predicts the landmarks quickly but accurately enough as a preliminary, by taking as input a low-resolution version of the detected face holistically. The following SANs then progressively refine the landmark by taking as input the local features extracted around the current landmarks (output of the previous SAN) with higher and higher resolution. Extensive experiments conducted on three challenging datasets demonstrate that our CFAN outperforms the state-of-the-art methods and performs in real-time(40+fps excluding face detection on a desktop).

[1]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[2]  Zhen Qin,et al.  Improving multi-target tracking via social grouping , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[4]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[5]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[6]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[7]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[8]  M.N.M. vanLieshout Depth map calculation for a variable number of moving objects using Markov sequential object processes , 2007 .

[9]  Adrien Treuille,et al.  Continuum crowds , 2006, ACM Trans. Graph..

[10]  Stacy Marsella,et al.  SmartBody: behavior realization for embodied conversational agents , 2008, AAMAS.

[11]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Massimo Piccardi,et al.  Background subtraction techniques: a review , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[13]  Kyoung Mu Lee,et al.  Markov Chain Monte Carlo combined with deterministic methods for Markov random field optimization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Mubarak Shah,et al.  A noniterative greedy algorithm for multiframe point correspondence , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[17]  Margrit Betke,et al.  Coupling detection and data association for multiple object tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  S. Shankar Sastry,et al.  Markov Chain Monte Carlo Data Association for Multi-Target Tracking , 2009, IEEE Transactions on Automatic Control.

[19]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[20]  Robert T. Collins,et al.  Marked point processes for crowd counting , 2009, CVPR.

[21]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Bi Song,et al.  A Stochastic Graph Evolution Framework for Robust Multi-target Tracking , 2010, ECCV.

[24]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Robert T. Collins,et al.  Multitarget data association with higher-order motion models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Thomas Mauthner,et al.  Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Demetri Terzopoulos,et al.  Smart Camera Networks in Virtual Reality , 2008, Proc. IEEE.

[28]  Giorgio Bonmassar,et al.  Space-variant active vision: Definition, overview and examples , 1995, Neural Networks.

[29]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Robert T. Collins,et al.  Multi-target Data Association by Tracklets with Unsupervised Parameter Estimation , 2008, BMVC.

[31]  Yannick Boursier,et al.  Sparsity Driven People Localization with a Heterogeneous Network of Cameras , 2011, Journal of Mathematical Imaging and Vision.

[32]  Ioannis A. Kakadiaris,et al.  Predicting Social Interactions for Visual Tracking , 2011, BMVC.

[33]  Ralph Gross,et al.  Generic vs. person specific active appearance models , 2005, Image Vis. Comput..

[34]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[35]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[36]  Jun Gao,et al.  Learning to Rank under Multiple Annotators , 2011, IJCAI.

[37]  Qingming Huang,et al.  Online crowdsourcing subjective image quality assessment , 2012, ACM Multimedia.

[38]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[39]  Alexandre Bernardino,et al.  A review of log-polar imaging for visual perception in robotics , 2010, Robotics and Autonomous Systems.

[40]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[41]  Mohamed R. Amer,et al.  Multiobject tracking as maximum weight independent set , 2011, CVPR 2011.

[42]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Frank Dellaert,et al.  MCMC Data Association and Sparse Factorization Updating for Real Time Multitarget Tracking with Merged and Multiple Measurements , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Yaser Sheikh,et al.  Monocular Object Detection Using 3D Geometric Primitives , 2012, ECCV.

[45]  Ernesto Brau,et al.  A generative statistical model for tracking multiple smooth trajectories , 2011, CVPR 2011.

[46]  Luc Van Gool,et al.  Coupled Detection and Trajectory Estimation for Multi-Object Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[47]  Pascal Fua,et al.  Multi-Commodity Network Flow for Tracking Multiple People , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Qingming Huang,et al.  HodgeRank on Random Graphs for Subjective Video Quality Assessment , 2012, IEEE Transactions on Multimedia.