A Robust Passage Retrieval Algorithm for Video Question Answering

In this paper, we present a robust passage retrieval algorithm to extend the conventional text question answering (Q/A) to videos. Users interact with our videoQ/A system through natural language queries, while the top-ranked passage fragments with associated video clips are returned as answers. We compare our method with five of the high-performance ranking algorithms that are portable to different languages and domains. The experiments were evaluated with 75.3 h of Chinese videos and 253 questions. The experimental results showed that our method outperformed the second best retrieval model (language models) in relatively 1.43% in mean reciprocal rank (MRR) score and 11.36% when employing a Chinese word segmentation tool. By adopting the initial retrieval results from the retrieval models, our method yields an improvement of at least 5.94% improvement in MRR score. This makes it very attractive for the Asia-like languages since the use of a well-developed word tokenizer is unnecessary.

[1]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[3]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[4]  Sanda M. Harabagiu,et al.  High performance question/answering , 2001, SIGIR '01.

[5]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[6]  Grace Hui Yang,et al.  Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.

[7]  Yutaka Sasaki Question Answering as Question-Biased Term Extraction: A New Approach toward Multilingual QA , 2005, ACL.

[8]  Jay F. Nunamaker,et al.  Question answering on lecture videos: a multifaceted approach , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[9]  Sargur N. Srihari,et al.  The design of a nearest-neighbor classifier and its use for Japanese character recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[10]  Jay F. Nunamaker,et al.  Automated Question Answering From Lecture Videos: NLP vs. Pattern Matching , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[11]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[12]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[13]  Hsin-Hsi Chen,et al.  A Simple Method for Chinese Video OCR and Its Application to Question Answering , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[14]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[15]  Yue-Shi Lee,et al.  A robust multilingual portable phrase chunking system , 2007, Expert Syst. Appl..

[16]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[17]  Yue-Shi Lee,et al.  Integrating Web Information to Generate Chinese Video Summaries , 2005, SEKE.

[18]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[19]  Adwait Ratnaparkhi,et al.  IBM's Statistical Question Answering System , 2000, TREC.

[20]  Jay F. Nunamaker,et al.  A natural language approach to content-based video indexing and retrieval for interactive e-learning , 2004, IEEE Transactions on Multimedia.

[21]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[22]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[23]  Yue-Shi Lee,et al.  CLVQ: cross-language video question/answering system , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[24]  Howard D. Wactlar,et al.  Informedia - Search and Summarization in the Video Medium , 2000 .

[25]  Fei Li,et al.  Chinese Information Retrieval Using Lemur: NTCIR-5 CIR Experiments at UNT , 2005, NTCIR.

[26]  Yu-Chieh Wu,et al.  Toward Multimedia: A String Pattern-Based Passage Ranking Model for Video Question Answering , 2007, HLT-NAACL.

[27]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[28]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[29]  Jay F. Nunamaker,et al.  Automated Question Answering From Videos : NLP vs . Pattern Matching , 2004 .

[30]  Grace Hui Yang,et al.  VideoQA: question answering on news video , 2003, MULTIMEDIA '03.

[31]  Fu Chang,et al.  Caption analysis and recognition for building video indexing systems , 2004, Multimedia Systems.

[32]  Ellen M. Voorhees,et al.  Overview of the TREC-9 Question Answering Track , 2000, TREC.

[33]  Yu-Chieh Wu,et al.  Description of the NCU Chinese Word Segmentation and Named Entity Recognition System for SIGHAN Bakeoff 2006 , 2006, SIGHAN@COLING/ACL.

[34]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[35]  Jungyun Seo,et al.  SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP , 2001, TREC.

[36]  Jimmy J. Lin,et al.  What Makes a Good Answer? The Role of Context in Question Answering , 2003, INTERACT.

[37]  Eduard H. Hovy,et al.  The Use of External Knowledge of Factoid QA , 2001, TREC.

[38]  W. Bruce Croft,et al.  Document Retrieval and Routing Using the INQUERY System , 1994, TREC.

[39]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[40]  Jacques Savoy,et al.  Comparative study of monolingual and multilingual search models for use with asian languages , 2005, TALIP.

[41]  Michael R. Lyu,et al.  A new approach for video text detection , 2002, Proceedings. International Conference on Image Processing.

[42]  Junlin Zhang,et al.  ISCAS in English-Chinese CLIR at NTCIR-5 , 2005, NTCIR.

[43]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[44]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[45]  Jianping Fan,et al.  Automatic image segmentation by integrating color-edge extraction and seeded region growing , 2001, IEEE Trans. Image Process..

[46]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[47]  Yue-Shi Lee,et al.  The Exploration of Deterministic and Efficient Dependency Parsing , 2006, CoNLL.

[48]  Sanda M. Harabagiu,et al.  Answering Complex, List and Context Questions with LCC's Question-Answering Server , 2001, TREC.

[49]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.