It's All About the Data This paper explains how training data is important for many computer vision algorithms and presents case studies of how the Internet can be used to obtain high-quality data.

Modern computer vision research consumes labelled data in quantity, and building datasets has become an important activity. The Internet has become a tremendous resource for computer vision researchers. By seeing the Internet as a vast, slightly disorganized collection of visual data, we can build datasets. The key point is that visual data are surrounded by contextual information like text and HTML tags, which is a strong, if noisy, cue to what the visual data means. In a series of case studies, we illustrate how useful this contextual information is. It can be used to build a large and challenging labelled face dataset with no manual intervention. With very small amounts of manual labor, contextual data can be used together with image data to identify pictures of animals. In fact, these contextual data are sufficiently reliable that a very large pool of noisily tagged images can be used as a resource to build image features, which reliably improve on conventional visual features. By seeing the Internet as a marketplace that can connect sellers of annotation services to researchers, we can obtain accurately annotated datasets quickly and cheaply. We describe methods to prepare data, check quality, and set prices for work for this annotation process. The problems posed by attempting to collect very big research datasets are fertile for researchers because collecting datasets requires us to focus on two important questions: What makes a good picture? What is the meaning of a picture?

[1]  Tamara L. Berg,et al.  names and faces. , 1982, The Physician and sportsmedicine.

[2]  Tomaso A. Poggio,et al.  Finding Human Faces with a Gaussian Mixture Distribution-Based Face Model , 1995, ACCV.

[3]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[4]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[6]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ricky Houghton Named Faces: Putting Names to Faces , 1999, IEEE Intell. Syst..

[8]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[9]  P. Jonathon Phillips,et al.  An Introduction to Evaluating Biometric Systems , 2000, Computer.

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[12]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  H. Cunningham,et al.  A framework and graphical development environment for robust NLP tools and applications , 2002, ACL.

[15]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[16]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[17]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  P. Jonathon Phillips,et al.  Meta-analysis of face recognition algorithms , 2001, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[19]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[20]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Yuxiao Hu,et al.  Efficient propagation for face annotation in family albums , 2004, MULTIMEDIA '04.

[24]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, CVPR 2004.

[25]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  James Ze Wang,et al.  The story picturing engine: finding elite images to illustrate a story using mutual reinforcement , 2004, MIR '04.

[27]  Ching-Yung Lin,et al.  Cross-Modality Automatic Face Model Training from Large Video Databases , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[28]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[29]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[31]  Alexander C. Berg,et al.  Who's In the Picture , 2004, NIPS 2004.

[32]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[33]  Erik G. Learned-Miller,et al.  Learning Hyper-Features for Visual Identification , 2004, NIPS.

[34]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[37]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[38]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[39]  Seong G. Kong,et al.  Recent advances in visual and infrared face recognition - a review , 2005, Comput. Vis. Image Underst..

[40]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[43]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[44]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[45]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Pinar Duygulu Sahin,et al.  A Graph Based Approach for Naming Faces in News Photos , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[47]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[48]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[49]  Boris Babenko,et al.  ImprovingWeb-based Image Search via Content Based Clustering , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[50]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[51]  Frédéric Jurie,et al.  Learning Visual Similarity Measures for Comparing Never Seen Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Sham M. Kakade,et al.  Leveraging archival video for building face datasets , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[53]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[54]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[55]  Trevor Darrell,et al.  Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, ICCV.

[57]  Tamara L. Berg,et al.  Automatic Ranking of Iconic Images , 2007 .

[58]  Erik G. Learned-Miller,et al.  Unsupervised Joint Alignment of Complex Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[59]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[60]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[61]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[62]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Stefano Soatto,et al.  Filtering Internet image search results towards keyword based category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, ECCV.

[65]  Fei-Fei Li,et al.  Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[66]  Tsuhan Chen,et al.  Estimating age, gender, and identity using first name priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[69]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[70]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[71]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[72]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[73]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Anil K. Jain,et al.  Handbook of Face Recognition, 2nd Edition , 2011 .