Efficient Advertisement Discovery for Audio Podcast Content Using Candidate Segmentation

Nowadays, audio podcasting has been widely used by many online sites such as newspapers, web portals, journals, and so forth, to deliver audio content to users through download or subscription. Within 1 to 30 minutes long of one podcast story, it is often that multiple audio advertisements (ads) are inserted into and repeated, with each of a length of 5 to 30 seconds, at different locations. Automatic detection of these attached ads is a challenging task due to the complexity of the search algorithms. Based on the knowledge of typical structures of podcast contents, this paper proposes a novel efficient advertisement discovery approach for large audio podcasting collections. The proposed approach offers a significant improvement on search speed with sufficient accuracy. The key to the acceleration comes from the advantages of candidate segmentation and sampling technique introduced to reduce both search areas and number of matching frames. The approach has been tested over a variety of podcast contents collected from MIT Technology Review, Scientific American, and Singapore Podcast websites. Experimental results show that the proposed algorithm archives detection rate of 97.5% with a significant computation saving as compared to existing state-of-the-art methods.

[1]  Ying Li,et al.  Instructional Video Content Analysis Using Audio Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  W. Marsden I and J , 2012 .

[3]  Stephen McAdams,et al.  Perspectives on the Contribution of Timbre to Musical Structure , 1999, Computer Music Journal.

[4]  Kunio Kashino,et al.  A quick search method for audio and video signals based on histogram pruning , 2003, IEEE Trans. Multim..

[5]  Changsheng Xu,et al.  Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis , 2006, MM '06.

[6]  Jonathan Foote,et al.  Audio Retrieval by Rhythmic Similarity , 2002, ISMIR.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Dima Ruinskiy,et al.  An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Wolfgang Effelsberg,et al.  On the detection and recognition of television commercials , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[10]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[11]  Qi Tian,et al.  Efficient Short Video Repeat Identification With Application to News Video Structure Analysis , 2007, IEEE Transactions on Multimedia.

[12]  Cormac Herley Accurate repeat finding and object skipping using fingerprints , 2005, MULTIMEDIA '05.

[13]  James S. Albus,et al.  I A New Approach to Manipulator Control: The I Cerebellar Model Articulation Controller , 1975 .

[14]  Daming Shi,et al.  A nature inspired Ying-Yang approach for intelligent decision support in bank solvency analysis , 2008, Expert Syst. Appl..

[15]  Daming Shi,et al.  Fuzzy CMAC With Incremental Bayesian Ying–Yang Learning and Dynamic Rule Construction , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Qing Zhu,et al.  Fuzzy and Evidence Reasoning , 1995 .

[17]  Cormac Herley,et al.  ARGOS: automatically extracting repeating objects from multimedia streams , 2006, IEEE Transactions on Multimedia.

[18]  Daming Shi,et al.  FCMAC-BYY: Fuzzy CMAC Using Bayesian Ying–Yang Learning , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..