A scalable summary generation method based on cross-modal consensus clustering and OLAP cube modeling

Video summarization has been a core problem to manage the growing amount of content in multimedia databases. An efficient video summary should display an overview of the video content and most existing approaches fulfill this goal. However, such an overview does not allow the user to reach all details of interest selectively and progressively. This paper proposes a novel scalable summary generation approach based on the On-Line Analytical Processing data cube. Such a structure integrates tools like the drill down operation allowing to browse efficiently multiple descriptions of a dataset according to increased levels of detail. We adapt this model to video summary generation by expressing a video within a cross-media feature space and by performing clusterings according to particular subspaces. Consensus clustering is used to guide the subspace selection strategy at small dimensions, as the novelty brought by the least consensual subspaces is interesting for the refinements of a summary. Our approach is designed for weakly-structured contents such as cultural documentaries. We perform its evaluation on a corpus of cultural archives provided by the French Audiovisual National Institute (INA) using information retrieval metrics handling single and multiple reference annotations. The performances obtained overall improved results compared to two baseline systems performing random and arbitrary segmentations, showing a better balance between Precision and Recall.

[1]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jurandy Almeida,et al.  Online video summarization on compressed domain , 2013, J. Vis. Commun. Image Represent..

[3]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[4]  Saeid Nahavandi,et al.  Human action recognition based on Pyramid Histogram of Oriented Gradients , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[5]  Ilaria Bartolini,et al.  The WINDSURF library for the efficient retrieval of multimedia hierarchical data , 2011, Proceedings of the International Conference on Signal Processing and Multimedia Applications.

[6]  Dominique Barba,et al.  Recovering of visual scenarios in movies by motion analysis and grouping spatio-temporal colour signatures of video shots , 2001, EUSFLAT Conf..

[7]  Abdellatif Mtibaa,et al.  A study of the color-structure descriptor for shot boundary detection , 2009 .

[8]  Gregory H. Wakefield,et al.  Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[9]  Dean S. Messing,et al.  The MPEG-7 colour structure descriptor: image description using colour and local spatial information , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[10]  Jenny Benois-Pineau,et al.  The COST292 experimental framework for rushes summarization task in TRECVID 2008 , 2008, TVS '08.

[11]  Matthieu Cord,et al.  Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion , 2008, TVS '08.

[12]  Vladimir Filkov,et al.  Consensus Clustering Algorithms: Comparison and Refinement , 2008, ALENEX.

[13]  Yiannis Kompatsiaris,et al.  TV Content Analysis: Techniques and Applications , 2011 .

[14]  Thomas Fillon,et al.  YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software , 2010, ISMIR.

[15]  Jenny Benois-Pineau,et al.  Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[16]  Riccardo Leonardi,et al.  Extraction of Significant Video Summaries by Dendrogram Analysis , 2006, 2006 International Conference on Image Processing.

[17]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Peng Shengze,et al.  Research on Image Retrieval Based on Scalable Color Descriptor of MPEG-7 , 2012 .

[20]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[21]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[22]  Jenny Benois-Pineau,et al.  Scalable video summarization of cultural video documents in cross-media space based on data cube approach , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[23]  Bernard Mérialdo,et al.  VERT: automatic evaluation of video summaries , 2010, ACM Multimedia.

[24]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[25]  Jiebo Luo,et al.  Visual cube and on-line analytical processing of images , 2010, CIKM '10.