Efficient discovery of unknown ads for audio podcast content

Audio podcasting has been widely used by many online sites such as newspapers, web portals, journal, etc., to deliver audio content to users through download or subscription. Within 1 to 30 minutes long of one podcast story, it is often that multiple audio advertisements (ads) are inserted into and repeated, with each of a length of 5 to 30 seconds, at different locations. Based on knowledge of typical structures of podcast contents, this paper proposes a novel efficient advertisement discovery approach to identify and locate unknown ads from a large collection of audio podcasting. Two techniques: candidate region segmentation and sampling technique are employed to speed up the search. The approach has been tested over a variety of podcast contents collected from MIT Technology Review, Scientific American, and Singapore Podcast websites. Experimental results show that the proposed approach achieves detection rate of 97.5% with a significant computation saving as compared to existing state-of-the art methods.