Describable Visual Attributes for Face Images

We introduce the use of describable visual attributes for face images. Describable visual attributes are labels that can be given to an image to describe its appearance. This thesis focuses mostly on images of faces and the attributes used to describe them, although the concepts also apply to other domains. Examples of face attributes include gender, age, jaw shape, nose size, etc. The advantages of an attribute-based representation for vision tasks are manifold: they can be composed to create descriptions at various levels of specificity; they are generalizable, as they can be learned once and then applied to recognize new objects or categories without any further training; and they are efficient, possibly requiring exponentially fewer attributes (and training data) than explicitly naming each category. We show how one can create and label large datasets of real-world images to train classifiers which measure the presence, absence, or degree to which an attribute is expressed in images. These classifiers can then automatically label new images. We demonstrate the current effectiveness and explore the future potential of using attributes for image search, automatic face replacement in images, and face verification, via both human and computational experiments. To aid other researchers in studying these problems, we introduce two new large face datasets, named FaceTracer and PubFig, with labeled attributes and identities, respectively. Finally, we also show the effectiveness of visual attributes in a completely different domain: plant species identification. To this end, we have developed and publicly released the Leafsnap system, which has been downloaded by almost half a million users. The mobile phone application is a flexible electronic field guide with high-quality images of the tree species in the Northeast US. It also gives users instant access to our automatic recognition system, greatly simplifying the identification process.

[1]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[2]  Tsachy Weissman,et al.  Universal discrete denoising: known channel , 2003, IEEE Transactions on Information Theory.

[3]  James Ze Wang,et al.  Content-based image retrieval: approaches and trends of the new age , 2005, MIR '05.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Anderson Rocha,et al.  Robust Fusion: Extreme Value Theory for Recognition Score Normalization , 2010, ECCV.

[6]  Paul Miller,et al.  Verification of face identities from images captured on video. , 1999 .

[7]  Zicheng Liu,et al.  Face relighting with radiance environment maps , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[9]  Peter Kontschieder,et al.  Beyond Pairwise Shape Similarity Analysis , 2009, ACCV.

[10]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[11]  Gang Wang,et al.  It's All About the Data , 2010, Proceedings of the IEEE.

[12]  Hang Li,et al.  Ranking refinement and its application to information retrieval , 2008, WWW.

[13]  Patrick J. Flynn,et al.  Preliminary Face Recognition Grand Challenge Results , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[14]  Paul E. Debevec,et al.  Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography , 1998, SIGGRAPH '08.

[15]  Frédéric Jurie,et al.  Learning Visual Similarity Measures for Comparing Never Seen Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Haibin Ling,et al.  Shape Classification Using the Inner-Distance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Carlos D. Castillo,et al.  Using Stereo Matching for 2-D Face Recognition Across Pose , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[20]  Christopher Edwards,et al.  The effects of filtered video on awareness and privacy , 2000, CSCW '00.

[21]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[24]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[25]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[26]  David W. Jacobs,et al.  In search of illumination invariants , 2001, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[27]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[28]  Yaniv Taigman,et al.  Descriptor Based Methods in the Wild , 2008 .

[29]  P. Jonathon Phillips,et al.  Empirical Evaluation Methods in Computer Vision , 2002 .

[30]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[31]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[33]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[34]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Tao Qin,et al.  Learning to rank relational objects and its application to web search , 2008, WWW.

[36]  Paul A. Viola,et al.  A unified learning framework for real time face detection and classification , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[37]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[38]  Ralph Gross,et al.  Model-Based Face De-Identification , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[39]  Erik G. Learned-Miller,et al.  Learning to Locate Informative Features for Visual Identification , 2008, International Journal of Computer Vision.

[40]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[41]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[44]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[46]  Hongyuan Zha,et al.  A General Boosting Method and its Application to Learning Ranking Functions for Web Search , 2007, NIPS.

[47]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[48]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[49]  John Hart,et al.  ACM Transactions on Graphics , 2004, SIGGRAPH 2004.

[50]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[51]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[52]  Roberto Scopigno,et al.  Computer Graphics forum , 2003, Computer Graphics Forum.

[53]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[55]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[56]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  M. L. Connolly Measurement of protein surface shape by solid angles , 1986 .

[58]  Zhuowen Tu,et al.  Improving Shape Retrieval by Learning Graph Transduction , 2008, ECCV.

[59]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[60]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[61]  Timothy F. Cootes,et al.  View-based active appearance models , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[62]  Pawan Sinha,et al.  Face Recognition by Humans: Nineteen Results All Computer Vision Researchers Should Know About , 2006, Proceedings of the IEEE.

[63]  Terrence J. Sejnowski,et al.  SEXNET: A Neural Network Identifies Sex From Human Faces , 1990, NIPS.

[64]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[65]  David J. Fleet,et al.  Human attributes from 3D pose tracking , 2010, Comput. Vis. Image Underst..

[66]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[67]  Stefano Soatto,et al.  A Study of Face Recognition as People Age , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[68]  Shree K. Nayar,et al.  Face swapping: automatically replacing faces in photographs , 2008, SIGGRAPH 2008.

[69]  Sean White,et al.  First steps toward an electronic field guide for plants , 2006 .

[70]  S. Lazebnik,et al.  An empirical Bayes approach to contextual region classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Fujio Yamaguchi,et al.  Computer-Aided Geometric Design , 2002, Springer Japan.

[72]  Henning Schulzrinne,et al.  Proceedings of the 12th annual ACM international conference on Multimedia , 2004, MM 2004.

[73]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[74]  Shahzad Malik,et al.  Digital Face Replacement in Photographs , 2003 .

[75]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[76]  Longin Jan Latecki,et al.  Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Sean White,et al.  Searching the World's Herbaria: A System for Visual Identification of Plant Species , 2008, ECCV.

[78]  Andrea J. van Doorn,et al.  Surface shape and curvature scales , 1992, Image Vis. Comput..

[79]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[80]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[81]  Tamara L. Berg,et al.  Baby Talk : Understanding and Generating Image Descriptions , 2011 .

[82]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[83]  Sami Romdhani,et al.  Face identification across different poses and illuminations with a 3D morphable model , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[84]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[85]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[86]  Tal Hassner,et al.  Similarity Scores Based on Background Samples , 2009, ACCV.

[87]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[88]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[89]  Ronen Basri,et al.  Lambertian reflectance and linear subspaces , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[90]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[91]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[92]  Tal Hassner,et al.  Multiple One-Shots for Utilizing Class Label Information , 2009, BMVC.

[93]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[94]  Longin Jan Latecki,et al.  Balancing Deformability and Discriminability for Shape Matching , 2010, ECCV.

[95]  Luc Van Gool,et al.  European conference on computer vision (ECCV) , 2006, eccv 2006.

[96]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[98]  Giulio Sandini,et al.  2nd European conference on computer vision , 1992, Image Vis. Comput..

[99]  Yuan Li,et al.  High-Performance Rotation Invariant Multiview Face Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[100]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[101]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.