Leveraging multiple cues for recognizing family photos

Social relation analysis via images is a new research area that has attracted much interest recently. As social media usage increases, a wide variety of information can be extracted from the growing number of consumer photos shared online, such as the category of events captured or the relationships between individuals in a given picture. Family is one of the most important units in our society, thus categorizing family photos constitutes an essential step toward image-based social analysis and content-based retrieval of consumer photos. We propose an approach that combines multiple unique and complimentary cues for recognizing family photos. The first cue analyzes the geometric arrangement of people in the photograph, which characterizes scene-level information with efficient yet discriminative capability. The second cue models facial appearance similarities to capture and quantify relevant pairwise relations between individuals in a given photo. The last cue investigates the semantics of the context in which the photo was taken. Experiments on a dataset containing thousands of family and non-family pictures collected from social media indicate that each individual model produces good recognition results. Furthermore, a combined approach incorporating appearance, geometric and semantic features significantly outperforms the state of the art in this domain, achieving 96.7% classification accuracy. A new geometry feature is proposed to capture people's standing pattern at the scene level.Deep convolutional neural network is incorporated into appearance model to capture facial similarities of the group photo.Semantic information is applied and fused with other information to discriminant two different photo categories.

[1]  Xiaolong Wang,et al.  Deeply-Learned Feature for Age Estimation , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Nicu Sebe,et al.  Knowing Where I Am: Exploiting Multi-Task Learning for Multi-view Indoor Image-based Localization , 2014, BMVC.

[4]  C. Burrus,et al.  DFT/FFT and Convolution Algorithms: Theory and Implementation , 1991 .

[5]  Bahram Parvin,et al.  Nuclei segmentation via sparsity constrained convolutional regression , 2015, 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).

[6]  Hong-Yuan Mark Liao,et al.  Discovering informative social subgraphs and predicting pairwise relationships from group photos , 2012, ACM Multimedia.

[7]  Dawei Li,et al.  EMOD: an efficient on-device mobile visual search system , 2015, MMSys.

[8]  Tieniu Tan,et al.  Automatic 3D face recognition combining global geometric features with local shape variation information , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[9]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[10]  Ralph Roskies,et al.  Fourier Descriptors for Plane Closed Curves , 1972, IEEE Transactions on Computers.

[11]  Julian Fiérrez,et al.  Fusion strategies in multimodal biometric verification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[12]  Roddy MacLeod,et al.  Coarse Filters for Shape Matching , 2002, IEEE Computer Graphics and Applications.

[13]  Liu Liu,et al.  Deep tree-structured face: A unified representation for multi-task facial biometrics , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[15]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[16]  Gang Wang,et al.  Seeing People in Social Context: Recognizing People and Social Relationships , 2010, ECCV.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Andrew C. Gallagher,et al.  Understanding images of groups of people , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Federica Marcolin,et al.  3D human face description: landmarks measures and geometrical features , 2012, Image Vis. Comput..

[20]  Thomas S. Huang,et al.  Modified Fourier Descriptors for Shape Representation - A Practical Approach , 1996 .

[21]  John Hershberger,et al.  Cartographic line simplification and polygon CSG formulæ in O(nlog * n) time , 1998, Comput. Geom..

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[24]  Xiaolong Wang,et al.  Kinship Measurement on Salient Facial Features , 2012, IEEE Transactions on Instrumentation and Measurement.

[25]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[26]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[27]  Bahram Parvin,et al.  Classification of Histology Sections via Multispectral Convolutional Sparse Coding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yu Yang,et al.  PIEFA: Personalized Incremental and Ensemble Face Alignment , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[30]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Jack Sklansky,et al.  Finding the convex hull of a simple polygon , 1982, Pattern Recognit. Lett..

[32]  Sanjay Ranka,et al.  An efficient parallel algorithm for high dimensional similarity join , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[33]  Guodong Guo,et al.  Cross-Age Face Recognition on a Very Large Database: The Performance versus Age Intervals and Improvement Using Soft Biometric Traits , 2010, 2010 20th International Conference on Pattern Recognition.

[34]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[35]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[36]  Arun Ross,et al.  An introduction to biometrics , 2008, ICPR 2008.

[37]  Bahram Parvin,et al.  Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[39]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Ahmed M. Elgammal,et al.  From circle to 3-sphere: Head pose estimation by instance parameterization , 2015, Comput. Vis. Image Underst..

[42]  Xiaolong Wang,et al.  Leveraging appearance and geometry for kinship verification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[43]  Tsuhan Chen,et al.  Towards computational models of kinship verification , 2010, 2010 IEEE International Conference on Image Processing.

[44]  Xiaolong Wang,et al.  Can We Minimize the Influence Due to Gender and Race in Age Estimation? , 2013, 2013 12th International Conference on Machine Learning and Applications.

[45]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[46]  John R. Smith,et al.  Massive-scale learning of image and video semantic concepts , 2015, IBM J. Res. Dev..

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[49]  Xiaolong Wang,et al.  Leveraging geometry and appearance cues for recognizing family photos , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[50]  Stefan Winkler,et al.  A data-driven approach to cleaning large face datasets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[51]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[52]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Nicu Sebe,et al.  Memory efficient large-scale image-based localization , 2014, Multimedia Tools and Applications.

[54]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yun Fu,et al.  Human age estimation using bio-inspired features , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Anil K. Jain,et al.  A Discriminative Model for Age Invariant Face Recognition , 2011, IEEE Transactions on Information Forensics and Security.

[57]  Jiwen Lu,et al.  Neighborhood repulsed metric learning for kinship verification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  S. Meeran,et al.  Optimum path planning using convex hull and local search heuristic algorithms , 1997 .

[59]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[60]  Fred Nicolls,et al.  Locating Facial Features with an Extended Active Shape Model , 2008, ECCV.

[61]  Tsuhan Chen,et al.  Kinship classification by modeling facial feature heredity , 2013, 2013 IEEE International Conference on Image Processing.

[62]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[63]  Jiebo Luo,et al.  Discovery of social relationships in consumer photo collections using Markov Logic , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[64]  László Szirmay-Kalos,et al.  Ray Coherence Between a Sphere and a Convex Polyhedron , 1992, Comput. Graph. Forum.

[65]  Guoyu Lu,et al.  Structure-from-Motion reconstruction based on weighted Hamming descriptors , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[66]  Jiebo Luo,et al.  Understanding Kin Relationships in a Photo , 2012, IEEE Transactions on Multimedia.

[67]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[68]  Xiaolong Wang,et al.  A study on human age estimation under facial expression changes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  A. Szczepkowicz,et al.  Computer simulation of FIM images — the convex hull model , 1999 .