Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search

We present an exploratory study of the retrieval of semiprofessional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics. Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.

[1]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[2]  Jean-Luc Gauvain,et al.  Speech Processing for Audio Indexing , 2008, GoTAL.

[3]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[4]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[5]  Gareth Jones,et al.  DCU at MediaEval 2011: Rich Speech Retrieval (RSR) , 2011 .

[6]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[7]  Roeland Ordelman,et al.  UTwente does Rich Speech Retrieval at MediaEval 2011 , 2011, MediaEval.

[8]  Djoerd Hiemstra,et al.  PFTijah: text search in an XML database system , 2006 .

[9]  Christian Wartena Comparing segmentation strategies for efficient video passage retrieval , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[10]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[11]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[12]  Martha Larson,et al.  Creating a Data Collection for Evaluating Rich Speech Retrieval , 2012, LREC.

[13]  Martha Larson,et al.  Rich Speech Retrieval Using Query Word Filter , 2011, MediaEval.

[14]  Maria Eskevich,et al.  DCU at MediaEval 2011: Rich Speech Retrieval , 2011, MediaEval.

[15]  Martha Larson,et al.  Overview of MediaEval 2011 Rich Speech Retrieval Task and Genre Tagging Task , 2011, MediaEval.

[16]  Mark Hepple,et al.  Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based POS Taggers , 2000, ACL.

[17]  Mohammad Soleymani,et al.  The Community and the Crowd: Multimedia Benchmark Dataset Development , 2012, IEEE MultiMedia.

[18]  Gareth J. F. Jones,et al.  Overview of the CLEF-2005 Cross-Language Speech Retrieval Track , 2005, CLEF.