论文信息 - Validating the Detection of Everyday Concepts in Visual Lifelogs

Validating the Detection of Everyday Concepts in Visual Lifelogs

The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a user's day-to-day activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer's life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and high-level semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept's presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging.

[1] Cor J. Veenman,et al. Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[2] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[3] Noel E. O'Connor,et al. Adaptive visual summary of lifeLog photos forpersonal information management , 2006 .

[4] Alan F. Smeaton,et al. Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[5] Shih-Fu Chang,et al. Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[6] Vannevar Bush,et al. As we may think , 1945, INTR.

[7] Alan F. Smeaton,et al. Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs , 2008, CIVR '08.

[8] Alan F. Smeaton,et al. An Examination of a Large Visual Lifelog , 2008, AIRS.

[9] Frédéric Jurie,et al. Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10] Wilson S. Geisler,et al. Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11] Hsuan-Tien Lin,et al. A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[12] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13] Richard W. Devaul,et al. Real-time motion classi ca-tion for wearable computing applications , 2001 .

[14] Liadh Kelly,et al. Applying contextual memory cues for retrieval from personal information archives , 2008 .

[15] Dennis Koelma,et al. The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[16] Dong Wang,et al. Video diver: generic video indexing with diverse features , 2007, MIR '07.