ConcatNet: A Deep Architecture of Concatenation-Assisted Network for Dense Facial Landmark Alignment

Facial landmark is one of the most basic elements for obtaining facial information such as facial expression and emotion. However, detecting dense landmarks on an image is challenging due to various facial poses. In this paper, a deep architecture for dense facial landmark detection, called ConcatNet, is proposed. In our architecture, we propose a CNN-based dense landmark detector on part regions of a face, which extends a given set of sparse landmarks to more accurate and dense landmarks. By introducing interface layers for coordinate normalization and part region localization, we concatenate a network for sparse landmark detection to ConcatNet in a global-to-local manner and the whole network to operate in an end-to-end manner. The experimental results on LFW and 300W datasets show that ConcatNet not only expands the number of the sparse landmarks but also increases the accuracy of the landmark positions remarkably. Also, ConcatNet shows high accuracy in detecting the dense landmarks with a smaller dataset and without additional data on an image such as 3D position annotations when compared to 3D model-based detection method.

[1]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[2]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[3]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[4]  Alan C. Bovik,et al.  Stereoscopic 3D Visual Discomfort Prediction: A Dynamic Accommodation and Vergence Interaction Model , 2016, IEEE Transactions on Image Processing.

[5]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Kwanghyun Lee,et al.  3D Perception Based Quality Pooling: Stereopsis, Binocular Rivalry, and Binocular Suppression , 2015, IEEE Journal of Selected Topics in Signal Processing.

[7]  Ashraf A. Kassim,et al.  Recurrent 3D-2D Dual Learning for Large-Pose Facial Landmark Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Heeseok Oh,et al.  Visual Presence: Viewing Geometry Visual Information of UHD S3D Entertainment. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[9]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[10]  Ralph Gross,et al.  Generic vs. person specific active appearance models , 2005, Image Vis. Comput..

[11]  Kwanghyun Lee,et al.  A New Framework for Measuring 2D and 3D Visual Information in Terms of Entropy , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Xiaoming Liu,et al.  Pose-Invariant Face Alignment via CNN-Based Dense 3D Model Fitting , 2017, International Journal of Computer Vision.

[13]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[14]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Xiaoming Liu,et al.  Dense Face Alignment , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[17]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).