论文信息 - A Robust Passage Retrieval Algorithm for Video Question Answering

A Robust Passage Retrieval Algorithm for Video Question Answering

In this paper, we present a robust passage retrieval algorithm to extend the conventional text question answering (Q/A) to videos. Users interact with our videoQ/A system through natural language queries, while the top-ranked passage fragments with associated video clips are returned as answers. We compare our method with five of the high-performance ranking algorithms that are portable to different languages and domains. The experiments were evaluated with 75.3 h of Chinese videos and 253 questions. The experimental results showed that our method outperformed the second best retrieval model (language models) in relatively 1.43% in mean reciprocal rank (MRR) score and 11.36% when employing a Chinese word segmentation tool. By adopting the initial retrieval results from the retrieval models, our method yields an improvement of at least 5.94% improvement in MRR score. This makes it very attractive for the Asia-like languages since the use of a well-developed word tokenizer is unnecessary.

Yu-Chieh Wu | Jie-Chi Yang | Jie-Chi Yang | Yu-Chieh Wu

[1] Michael R. Lyu,et al. A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[2] Tong Zhang,et al. Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[3] Rainer Lienhart,et al. Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[4] Sanda M. Harabagiu,et al. High performance question/answering , 2001, SIGIR '01.

[5] Gina-Anne Levow,et al. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[6] Grace Hui Yang,et al. Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.

[7] Yutaka Sasaki. Question Answering as Question-Biased Term Extraction: A New Approach toward Multilingual QA , 2005, ACL.

[8] Jay F. Nunamaker,et al. Question answering on lecture videos: a multifaceted approach , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[9] Sargur N. Srihari,et al. The design of a nearest-neighbor classifier and its use for Japanese character recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[10] Jay F. Nunamaker,et al. Automated Question Answering From Lecture Videos: NLP vs. Pattern Matching , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[11] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[12] W. Bruce Croft,et al. Passage retrieval based on language models , 2002, CIKM '02.

[13] Hsin-Hsi Chen,et al. A Simple Method for Chinese Video OCR and Its Application to Question Answering , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[14] Ellen M. Voorhees,et al. Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[15] Yue-Shi Lee,et al. A robust multilingual portable phrase chunking system , 2007, Expert Syst. Appl..

[16] Stephen E. Robertson,et al. Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[17] Yue-Shi Lee,et al. Integrating Web Information to Generate Chinese Video Summaries , 2005, SEKE.

[18] Takeo Kanade,et al. Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[19] Adwait Ratnaparkhi,et al. IBM's Statistical Question Answering System , 2000, TREC.

[20] Jay F. Nunamaker,et al. A natural language approach to content-based video indexing and retrieval for interactive e-learning , 2004, IEEE Transactions on Multimedia.

[21] Tat-Seng Chua,et al. Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[22] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .

[23] Yue-Shi Lee,et al. CLVQ: cross-language video question/answering system , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[24] Howard D. Wactlar,et al. Informedia - Search and Summarization in the Video Medium , 2000 .

[25] Fei Li,et al. Chinese Information Retrieval Using Lemur: NTCIR-5 CIR Experiments at UNT , 2005, NTCIR.

[26] Yu-Chieh Wu,et al. Toward Multimedia: A String Pattern-Based Passage Ranking Model for Video Question Answering , 2007, HLT-NAACL.

[27] Thomas H. Cormen,et al. Introduction to algorithms [2nd ed.] , 2001 .

[28] Jimmy J. Lin,et al. Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[29] Jay F. Nunamaker,et al. Automated Question Answering From Videos : NLP vs . Pattern Matching , 2004 .

[30] Grace Hui Yang,et al. VideoQA: question answering on news video , 2003, MULTIMEDIA '03.

[31] Fu Chang,et al. Caption analysis and recognition for building video indexing systems , 2004, Multimedia Systems.

[32] Ellen M. Voorhees,et al. Overview of the TREC-9 Question Answering Track , 2000, TREC.

[33] Yu-Chieh Wu,et al. Description of the NCU Chinese Word Segmentation and Named Entity Recognition System for SIGHAN Bakeoff 2006 , 2006, SIGHAN@COLING/ACL.

[34] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[35] Jungyun Seo,et al. SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP , 2001, TREC.

[36] Jimmy J. Lin,et al. What Makes a Good Answer? The Role of Context in Question Answering , 2003, INTERACT.

[37] Eduard H. Hovy,et al. The Use of External Knowledge of Factoid QA , 2001, TREC.

[38] W. Bruce Croft,et al. Document Retrieval and Routing Using the INQUERY System , 1994, TREC.

[39] Ellen M. Voorhees,et al. The TREC-8 Question Answering Track Report , 1999, TREC.

[40] Jacques Savoy,et al. Comparative study of monolingual and multilingual search models for use with asian languages , 2005, TALIP.

[41] Michael R. Lyu,et al. A new approach for video text detection , 2002, Proceedings. International Conference on Image Processing.

[42] Junlin Zhang,et al. ISCAS in English-Chinese CLIR at NTCIR-5 , 2005, NTCIR.

[43] John R. Smith,et al. IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[44] W. Bruce Croft,et al. A language modeling approach to information retrieval , 1998, SIGIR '98.

[45] Jianping Fan,et al. Automatic image segmentation by integrating color-edge extraction and seeded region growing , 2001, IEEE Trans. Image Process..

[46] Sabine Buchholz,et al. CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[47] Yue-Shi Lee,et al. The Exploration of Deterministic and Efficient Dependency Parsing , 2006, CoNLL.

[48] Sanda M. Harabagiu,et al. Answering Complex, List and Context Questions with LCC's Question-Answering Server , 2001, TREC.

[49] Paul Over,et al. TRECVID 2005 - An Overview , 2005, TRECVID.