MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes

Attribute recognition, particularly facial, extracts many labels for each image. While some multi-task vision problems can be decomposed into separate tasks and stages, e.g., training independent models for each task, for a growing set of problems joint optimization across all tasks has been shown to improve performance. We show that for deep convolutional neural network (DCNN) facial attribute extraction, multi-task optimization is better. Unfortunately, it can be difficult to apply joint optimization to DCNNs when training data is imbalanced, and re-balancing multi-label data directly is structurally infeasible, since adding/removing data to balance one label will change the sampling of the other labels. This paper addresses the multi-label imbalance problem by introducing a novel mixed objective optimization network (MOON) with a loss function that mixes multiple task objectives with domain adaptive re-weighting of propagated loss. Experiments demonstrate that not only does MOON advance the state of the art in facial attribute recognition, but it also outperforms independently trained DCNNs using the same data. When using facial attributes for the LFW face recognition task, we show that our balanced (domain adapted) network outperforms the unbalanced trained network.

[1]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[2]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[3]  N. Ahuja,et al.  Robust Visual Tracking via MultiTask Sparse Learning , 2012 .

[4]  Yongxin Yang,et al.  A Unified Perspective on Multi-Domain and Multi-Task Learning , 2014, ICLR.

[5]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[6]  Du-Sik Park,et al.  Rotating your face using multi-task deep neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xiaogang Wang,et al.  Boosted multi-task learning for face verification with applications to web image and video search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Yang Zhong,et al.  Leveraging mid-level deep representations for predicting face attributes in the wild , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[10]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[11]  Donghoon Lee,et al.  Face attribute classification using attribute-aware correlation map and gated convolutional neural networks , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[12]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Abhishek Dutta,et al.  Impact of eye detection error on face recognition performance , 2015, IET Biom..

[14]  Subramanian Ramanathan,et al.  No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Xiaogang Wang,et al.  Multi-source Deep Learning for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Liang Wang,et al.  Unconstrained Multimodal Multi-Label Learning , 2015, IEEE Transactions on Multimedia.

[17]  Kristen Grauman,et al.  Decorrelating Semantic Visual Attributes by Resisting the Urge to Share , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[20]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Bernard Ghanem,et al.  On the relationship between visual attributes and convolutional networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zhengyou Zhang,et al.  Improving multiview face detection with multi-task deep convolutional neural networks , 2014, IEEE Winter Conference on Applications of Computer Vision.

[23]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[24]  Jing Wang,et al.  Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Terrance E. Boult,et al.  Multi-attribute spaces: Calibration for attribute fusion and similarity search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[27]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Wei Wang,et al.  Multi-task deep neural network for multi-label learning , 2013, 2013 IEEE International Conference on Image Processing.

[30]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[32]  Shree K. Nayar,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Describable Visual Attributes for Face Verification and Image Search , 2022 .

[33]  Narendra Ahuja,et al.  Robust visual tracking via multi-task sparse learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Terrance E. Boult,et al.  Exemplar codes for facial attributes and tattoo recognition , 2014, IEEE Winter Conference on Applications of Computer Vision.

[35]  Volker Blanz,et al.  Face recognition based on a 3D morphable model , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[36]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[37]  Jiebo Luo,et al.  Weakly Semi-Supervised Deep Learning for Multi-Label Image Annotation , 2015, IEEE Transactions on Big Data.

[38]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[39]  Xiaoou Tang,et al.  Learning Social Relation Traits from Face Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Mohamed R. Amer,et al.  Facial Attributes Classification Using Multi-task Representation Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.