It's All About the Data

Modern computer vision research consumes labelled data in quantity, and building datasets has become an important activity. The Internet has become a tremendous resource for computer vision researchers. By seeing the Internet as a vast, slightly disorganized collection of visual data, we can build datasets. The key point is that visual data are surrounded by contextual information like text and HTML tags, which is a strong, if noisy, cue to what the visual data means. In a series of case studies, we illustrate how useful this contextual information is. It can be used to build a large and challenging labelled face dataset with no manual intervention. With very small amounts of manual labor, contextual data can be used together with image data to identify pictures of animals. In fact, these contextual data are sufficiently reliable that a very large pool of noisily tagged images can be used as a resource to build image features, which reliably improve on conventional visual features. By seeing the Internet as a marketplace that can connect sellers of annotation services to researchers, we can obtain accurately annotated datasets quickly and cheaply. We describe methods to prepare data, check quality, and set prices for work for this annotation process. The problems posed by attempting to collect very big research datasets are fertile for researchers because collecting datasets requires us to focus on two important questions: What makes a good picture? What is the meaning of a picture?

[1]  Y. Freund,et al.  Active learning for visual object detection , 2005 .

[2]  Pinar Duygulu Sahin,et al.  A Graph Based Approach for Naming Faces in News Photos , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Boris Babenko,et al.  ImprovingWeb-based Image Search via Content Based Clustering , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[4]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[5]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[6]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Sham M. Kakade,et al.  Leveraging archival video for building face datasets , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[9]  Yuxiao Hu,et al.  Efficient propagation for face annotation in family albums , 2004, MULTIMEDIA '04.

[10]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[11]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, ICCV.

[12]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[13]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[14]  Ricky Houghton Named Faces: Putting Names to Faces , 1999, IEEE Intell. Syst..

[15]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[16]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[17]  P. Jonathon Phillips,et al.  Meta-analysis of face recognition algorithms , 2001, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[18]  Tsuhan Chen,et al.  Estimating age, gender, and identity using first name priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[21]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[25]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[26]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[27]  Erik G. Learned-Miller,et al.  Learning Hyper-Features for Visual Identification , 2004, NIPS.

[28]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[29]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[30]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[31]  Ching-Yung Lin,et al.  Cross-Modality Automatic Face Model Training from Large Video Databases , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[32]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[33]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, ECCV.

[34]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[38]  Anil K. Jain,et al.  Handbook of Face Recognition, 2nd Edition , 2011 .

[39]  Stefano Soatto,et al.  Filtering Internet image search results towards keyword based category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[42]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[44]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[45]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[46]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[47]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[48]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[49]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[50]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[51]  Alexander C. Berg,et al.  Who's In the Picture , 2004, NIPS 2004.

[52]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[53]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[54]  Tamara L. Berg,et al.  Automatic Ranking of Iconic Images , 2007 .

[55]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[56]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[57]  SchieleBernt,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008 .

[58]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[59]  Anthony Hoogs,et al.  Evaluation of Localized Semantics: Data, Methodology, and Experiments , 2008, International Journal of Computer Vision.

[60]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[61]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[62]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[64]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[65]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[66]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Frédéric Jurie,et al.  Learning Visual Similarity Measures for Comparing Never Seen Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[69]  Svetlana Lazebnik,et al.  Computing iconic summaries of general visual concepts , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[71]  Trevor Darrell,et al.  Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Tomaso A. Poggio,et al.  Finding Human Faces with a Gaussian Mixture Distribution-Based Face Model , 1995, ACCV.

[73]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74]  Tamara L. Berg,et al.  names and faces. , 1982, The Physician and sportsmedicine.

[75]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[76]  Seong G. Kong,et al.  Recent advances in visual and infrared face recognition - a review , 2005, Comput. Vis. Image Underst..

[77]  W. Marsden I and J , 2012 .

[78]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, CVPR 2004.

[79]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[80]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[81]  James Ze Wang,et al.  The story picturing engine: finding elite images to illustrate a story using mutual reinforcement , 2004, MIR '04.

[82]  Fei-Fei Li,et al.  Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[83]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[84]  Yajie Tian,et al.  Handbook of face recognition , 2003 .

[85]  P. Jonathon Phillips,et al.  An Introduction to Evaluating Biometric Systems , 2000, Computer.

[86]  R. Stephenson A and V , 1962, The British journal of ophthalmology.