Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs

The SenseCam is a passive capture wearable camera and when worn continuously it takes an average of 1,900 images per day. It can be used to create a personal lifelog or visual recording of a wearer's life which can be helpful as an aid to human memory. For such a large amount of visual information to be useful, it needs to be structured into "events", which can be achieved through automatic segmentation. An important component of this structuring process is the selection of keyframes to represent individual events. This work investigates a variety of techniques for the selection of a single representative keyframe image from each event, in order to provide the user with an instant visual summary of that event. In our experiments we use a large test set of 2,232 lifelog events collected by 5 users over a time period of one month each. We propose a novel keyframe selection technique which seeks to select the image with the highest "quality" as the keyframe. The inclusion of "quality" approaches in keyframe selection is demonstrated to be useful owing to the high variability in image visual quality within passively captured image collections.

[1]  Bernard Mérialdo,et al.  Split-screen dynamically accelerated video summaries , 2007, TVS '07.

[2]  G. Bell,et al.  A digital life , 2007 .

[3]  Gordon Bell,et al.  Passive capture and ensuing issues for a personal lifetime store , 2004, CARPE'04.

[4]  Alan F. Smeaton,et al.  Indexing, browsing, and searching of digital video , 2005, Annu. Rev. Inf. Sci. Technol..

[5]  Jonathan Foote,et al.  Discriminative techniques for keyframe selection , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6]  Alan F. Smeaton,et al.  Using Graphics Processor Units (GPUs) for Automatic Video Structuring , 2007, Eighth International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '07).

[7]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[8]  Alan F. Smeaton,et al.  Multimodal Segmentation of Lifelog Data , 2007, RIAO.

[9]  Marcus Jerome Pickering,et al.  Video Retrieval by Feature Learning in Key Frames , 2002, CIVR.

[10]  David R. Bull,et al.  Video Retrieval Using Global Features in Keyframes , 2002, TREC.

[11]  Alan F. Smeaton,et al.  An Examination of a Large Visual Lifelog , 2008, AIRS.

[12]  Alan F. Smeaton,et al.  Using bluetooth and GPS metadata to measure event similarity in SenseCam Images , 2007 .

[13]  Stefan Winkler,et al.  A no-reference perceptual blur metric , 2002, Proceedings. International Conference on Image Processing.

[14]  A.F. Smeaton,et al.  Combining Face Detection and Novelty to Identify Important Events in a Visual Lifelog , 2008, 2008 IEEE 8th International Conference on Computer and Information Technology Workshops.

[15]  Paul Over,et al.  TRECVID 2004 - An Overview , 2004, TRECVID.

[16]  Alan F. Smeaton,et al.  Investigating Biometric Response for Information Retrieval Applications , 2006, ECIR.

[17]  Alan F. Smeaton,et al.  Bluetooth familiarity: methods of calculation, applications and limitations , 2007 .

[18]  Thomas S. Huang,et al.  Image processing , 1971 .

[19]  Alan F. Smeaton,et al.  Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[20]  Wei-Hao Lin,et al.  Structuring continuous video recordings of everyday life using time-constrained clustering , 2006, Electronic Imaging.

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[22]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[23]  Alan F. Smeaton,et al.  A usage study of retrieval modalities for video shot retrieval , 2006, Inf. Process. Manag..

[24]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[25]  Philip Mullen ARIST 42 – Annual Review of Information Science and Technology (Volume 42, 2008) , 2009 .