Toward a conceptual framework of key-frame extraction and storyboard display for video summarization

Two key problems in developing a storyboard are (a) the extraction of video key frames and (b) the display of the storyboard. On the basis of our findings from a preliminary study as well as the results of previous studies on the computerized extraction of key frames and human recognition of images and videos, we propose an algorithm for the extraction of key frames and the structural display of a storyboard. In order to evaluate the proposed algorithm, we conducted an experiment, the results of which suggest that participants produce better summaries of the given videos when they view storyboards that are composed of key frames extracted using the proposed algorithmic method. This finding held, regardless of whether the display pattern used was sequential or structural. In contrast, the experimental results suggest that in the case of employing a mechanical method, the use of a structural display pattern yields greater performance in terms of participants' ability to summarize the given videos. Elaborating on our results, we discuss the practical implications of our findings for video summarization and retrieval. © 2010 Wiley Periodicals, Inc.

[1]  Bernard Mérialdo,et al.  Split-screen dynamically accelerated video summaries , 2007, TVS '07.

[2]  Mari Laine-Hernandez,et al.  Multifaceted image similarity criteria as revealed by sorting tasks , 2008, ASIST.

[3]  Ioannis Pratikakis,et al.  Detecting text in video frames , 2007 .

[4]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[5]  Christophe Marsala,et al.  Video rushes summarization by adaptive acceleration and stacking of shots , 2007, TVS '07.

[6]  Gary Marchionini,et al.  Multimodal surrogates for video browsing , 1999, DL '99.

[7]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[8]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[9]  Seiichi Uchida,et al.  Mosaicing-by-recognition for video-based text recognition , 2008, Pattern Recognit..

[10]  Alan F. Smeaton,et al.  A usage study of retrieval modalities for video shot retrieval , 2006, Inf. Process. Manag..

[11]  Gary Marchionini,et al.  Text or Pictures? An Eyetracking Study of How People View Digital Video Surrogates , 2003, CIVR.

[12]  Brian C. O'Connor,et al.  Modelling what users see when they look at images: a cognitive viewpoint , 2002, J. Documentation.

[13]  JungWon Yoon Searching for an image conveying connotative meanings: An exploratory cross-cultural study , 2008 .

[14]  Joan E. Beaudoin Folksonomies: Flickr image tagging: Patterns made visible , 2008 .

[15]  Yasuo Ariki,et al.  Highlight scene extraction in real time from baseball live video , 2003, MIR '03.

[16]  Yuncai Liu,et al.  Automatic scene detection for advanced story retrieval , 2009, Expert Syst. Appl..

[17]  Hemalata Iyer,et al.  Theories of cognition and image categorization: What category labels reveal about basic level theory , 2008 .

[18]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[19]  Hemalata Iyer,et al.  Prioritization strategies for video storyboard keyframes , 2007 .

[20]  Sara Shatford,et al.  Analyzing the Subject of a Picture: A Theoretical Approach , 1986 .

[21]  Masahiro Okuda,et al.  Position-Based Keyframe Selection for Human Motion Animation , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[22]  Yuncai Liu,et al.  Video scene segmentation and semantic representation using a novel scheme , 2009, Multimedia Tools and Applications.

[23]  Joan E. Beaudoin Flickr image tagging: Patterns made visible , 2007 .

[24]  Alan F. Smeaton Techniques used and open challenges to the analysis, indexing and retrieval of digital video , 2007, Inf. Syst..

[25]  Gary Marchionini,et al.  The Open Video Digital Library: A Möbius strip of research and practice , 2006 .

[26]  Gary Marchionini,et al.  Exploring users' video relevance criteria - A pilot study , 2005, ASIST.

[27]  Gary Geisler,et al.  Tagging video: conventions and strategies of the YouTube community , 2007, JCDL '07.

[28]  Hyun-Hee Kim,et al.  Investigating the End-User Tagging Behavior and its Implications in Flickr , 2009 .

[29]  Huanbo Luan,et al.  Content-based video retrieval: Three example systems from TRECVid , 2008 .

[30]  Toshio Sato,et al.  Toshiba at TRECVID 2009: Surveillance Event Detection Task , 2008, TRECVID.

[31]  Wolfgang G. Stock,et al.  Folksonomy and information retrieval , 2008, ASIST.

[32]  John Riedl,et al.  tagging, communities, vocabulary, evolution , 2006, CSCW '06.