Everyday concept detection in visual lifelogs: validation, relationships and trends

The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-to-day activities. It captures on average 3,000 images in a typical day, equating to almost 1 million images per year. It can be used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the domain of visual lifelogs. Our concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were evaluated on a subset of 95,907 images, to determine the accuracy for detection of each semantic concept. We conducted further analysis on the temporal consistency, co-occurance and relationships within the detected concepts to more extensively investigate the robustness of the detectors within this domain.

[1]  Chong-Wah Ngo,et al.  Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search , 2008, TRECVID.

[2]  Jan-Mark Geusebroek,et al.  Compact Object Descriptors from Local Colour Invariant Histograms , 2006, BMVC.

[3]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[4]  Alan F. Smeaton,et al.  Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[5]  G. Bell,et al.  A digital life , 2007 .

[6]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[7]  Arnold W. M. Smeulders,et al.  c ○ 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Six-Stimulus Theory for Stochastic Texture , 2002 .

[8]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[9]  A.F. Smeaton,et al.  Combining Face Detection and Novelty to Identify Important Events in a Visual Lifelog , 2008, 2008 IEEE 8th International Conference on Computer and Information Technology Workshops.

[10]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[11]  Cor J. Veenman,et al.  Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[12]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[13]  Alan F. Smeaton,et al.  Validating the Detection of Everyday Concepts in Visual Lifelogs , 2008, SAMT.

[14]  Rong Yan,et al.  How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[15]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[16]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[17]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[18]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Noel E. O'Connor,et al.  Adaptive visual summary of lifeLog photos forpersonal information management , 2006 .

[21]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[22]  Jun Yang,et al.  Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.

[23]  Alan F. Smeaton,et al.  An Examination of a Large Visual Lifelog , 2008, AIRS.

[24]  Arnold W. M. Smeulders,et al.  Color texture measurement and segmentation , 2005, Signal Process..

[25]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[26]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[28]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Richard W. Devaul,et al.  Real-time motion classi ca-tion for wearable computing applications , 2001 .

[31]  Alan F. Smeaton,et al.  Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs , 2008, CIVR '08.

[32]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[33]  Alan F. Smeaton,et al.  Using bluetooth and GPS metadata to measure event similarity in SenseCam Images , 2007 .

[34]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[35]  Liadh Kelly,et al.  Applying contextual memory cues for retrieval from personal information archives , 2008 .

[36]  Alan F. Smeaton,et al.  MediAssist: Using Content-Based Analysis and Context to Manage Personal Photo Collections , 2006, CIVR.