论文信息 - Everyday concept detection in visual lifelogs: validation, relationships and trends

Everyday concept detection in visual lifelogs: validation, relationships and trends

The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user’s day-to-day activities. It captures on average 3,000 images in a typical day, equating to almost 1 million images per year. It can be used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer’s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the domain of visual lifelogs. Our concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept’s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were evaluated on a subset of 95,907 images, to determine the accuracy for detection of each semantic concept. We conducted further analysis on the temporal consistency, co-occurance and relationships within the detected concepts to more extensively investigate the robustness of the detectors within this domain.

[1] Chong-Wah Ngo,et al. Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search , 2008, TRECVID.

[2] Jan-Mark Geusebroek,et al. Compact Object Descriptors from Local Colour Invariant Histograms , 2006, BMVC.

[3] Andrew K. C. Wong,et al. A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[4] Alan F. Smeaton,et al. Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[5] G. Bell,et al. A digital life , 2007 .

[6] Hsuan-Tien Lin,et al. A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[7] Arnold W. M. Smeulders,et al. c ○ 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Six-Stimulus Theory for Stochastic Texture , 2002 .

[8] Vladimir Naumovich Vapni. The Nature of Statistical Learning Theory , 1995 .

[9] A.F. Smeaton,et al. Combining Face Detection and Novelty to Identify Important Events in a Visual Lifelog , 2008, 2008 IEEE 8th International Conference on Computer and Information Technology Workshops.

[10] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[11] Cor J. Veenman,et al. Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[12] Shih-Fu Chang,et al. Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[13] Alan F. Smeaton,et al. Validating the Detection of Everyday Concepts in Visual Lifelogs , 2008, SAMT.

[14] Rong Yan,et al. How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[15] Dennis Koelma,et al. The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[16] John R. Smith,et al. IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[17] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[18] Frédéric Jurie,et al. Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[20] Noel E. O'Connor,et al. Adaptive visual summary of lifeLog photos forpersonal information management , 2006 .

[21] Chong-Wah Ngo,et al. Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[22] Jun Yang,et al. Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.

[23] Alan F. Smeaton,et al. An Examination of a Large Visual Lifelog , 2008, AIRS.

[24] Arnold W. M. Smeulders,et al. Color texture measurement and segmentation , 2005, Signal Process..

[25] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.

[26] Wilson S. Geisler,et al. Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[27] Dong Wang,et al. Video diver: generic video indexing with diverse features , 2007, MIR '07.

[28] Milind R. Naphade,et al. A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[29] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30] Richard W. Devaul,et al. Real-time motion classi ca-tion for wearable computing applications , 2001 .

[31] Alan F. Smeaton,et al. Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs , 2008, CIVR '08.

[32] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[33] Alan F. Smeaton,et al. Using bluetooth and GPS metadata to measure event similarity in SenseCam Images , 2007 .

[34] Shahram Izadi,et al. SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[35] Liadh Kelly,et al. Applying contextual memory cues for retrieval from personal information archives , 2008 .

[36] Alan F. Smeaton,et al. MediAssist: Using Content-Based Analysis and Context to Manage Personal Photo Collections , 2006, CIVR.