Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis

TV advertising is ubiquitous, perseverant, and economically vital. Millions of people's living and working habits are affected by TV commercials. In this paper, we present a multimodal ("visual + audio + text") commercial video digest scheme to segment individual commercials and carry out semantic content analysis within a detected commercial segment from TV streams.Two challenging issues are addressed. Firstly, we propose a multimodal approach to robustly detect the boundaries of individual commercials. Secondly, we attempt to classify a commercial with respect to advertised products/services. For the first, the boundary detection of individual commercials is reduced to the problem of binary classification of shot boundaries via the mid-level features derived from two concepts: Image Frames Marked with Product Information (FMPI) and Audio Scene Change Indicator (ASCI). Moreover, the accurate individual boundary enables us to perform commercial identification by clip matching via a spatial-temporal signature. For the second, commercial classification is formulated as the task of text categorization by expanding sparse texts from ASR/OCR with external knowledge. Our boundary detection has achieved a good result of F1 = 93.7% on the dataset comprising 499 individual commercials from TRECVID'05 video corpus. Commercial classification has obtained a promising accuracy of 80.9% on 141 distinct ones. Based on these achievements, various applications such as an intelligent digital TV set-top box can be accomplished to enhance the TV viewer's capabilities in monitoring and managing commercials from TV streams.

[1]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[3]  John V. Vilanilam,et al.  Advertising basics! : a resource guide for beginners , 2004 .

[4]  Tat-Seng Chua,et al.  TRECVID 2005 by NUS PRIS , 2005, TRECVID.

[5]  Qi Tian,et al.  Fast and robust short video clip search using an index structure , 2004, MIR '04.

[6]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[7]  Keiichiro Hoashi,et al.  Shot Boundary Detection and Low-Level Feature Extraction Experiments for TRECVID 2005 , 2005, TRECVID.

[8]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[9]  Ling-Yu Duan,et al.  A Robust Method for TV Logo Tracking in Video Streams , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[10]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[11]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[12]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[13]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  Antonio Albiol,et al.  COMMERCIALS DETECTION USING HMMS , 2003 .

[15]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[16]  Changsheng Xu,et al.  Live sports event detection based on broadcast video and web-casting text , 2006, MM '06.

[17]  Wolfgang Effelsberg,et al.  On the detection and recognition of television commercials , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[18]  Noel E. O'Connor,et al.  Audio and video processing for automatic TV advertisement detection , 2001 .

[19]  Alberto Del Bimbo,et al.  Retrieval of Commercials by Semantic Content: The Semiotic Perspective , 2004, Multimedia Tools and Applications.

[20]  Kunio Kashino,et al.  A quick search method for audio and video signals based on histogram pruning , 2003, IEEE Trans. Multim..

[21]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[22]  Ruud M. Bolle,et al.  Comparison of sequence matching techniques for video copy detection , 2001, IS&T/SPIE Electronic Imaging.

[23]  Qi Tian,et al.  A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus , 2003, TRECVID.

[24]  Shahram Ebadollahi,et al.  Commercial detection in heterogeneous video streams using fused multi-modal and temporal features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[27]  J. David Schaffer,et al.  Evolvable visual commercial detector , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Michael J. Witbrock,et al.  Story segmentation and detection of commercials in broadcast news video , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[29]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[30]  Wolfgang Effelsberg,et al.  Scene Determination Based on Video and Audio Features , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[31]  Shih-Fu Chang,et al.  Computable scenes and structures in films , 2002, IEEE Trans. Multim..

[32]  Anil K. Jain,et al.  On texture in document images , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Grace Hui Yang,et al.  VideoQA: question answering on news video , 2003, MULTIMEDIA '03.

[34]  Qi Tian,et al.  A unified framework for semantic shot classification in sports video , 2005, IEEE Trans. Multim..

[35]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[37]  Rainer Lienhart,et al.  Scene Determination Based on Video and Audio Features , 2004, Multimedia Tools and Applications.

[38]  Lie Lu,et al.  Robust learning-based TV commercial detection , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[39]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[40]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.