A New Passage Ranking Algorithm for Video Question Answering

Developing a question answering (Q/A) system involves in integrating abundant linguistic resources such as syntactic parsers, named entity recognizers which are not only impose time cost but also unavailable in other languages. Ranking-based approaches take the advantage of both efficiency and multilingual portability but most of them bias to high frequent words. In this paper, we propose a new passage ranking algorithm for extending textQ/A toward videoQ/A based on searching lexical information in videos. This method takes both N-gram match and word density into account and finds the optimal match sequence using dynamic programming techniques. Besides, it is very efficient to handle real time tasks for online video question answering. We evaluated our method with 150 actual user's questions on the 45GB video collections. Nevertheless, four well-known but multilingual portable ranking approaches were adopted to compare. Experimental results show that our method outperforms the second best approach with relatively 25.64% MRR score.

[1]  Grace Hui Yang,et al.  VideoQA: question answering on news video , 2003, MULTIMEDIA '03.

[2]  Fu Chang,et al.  Caption analysis and recognition for building video indexing systems , 2004, Multimedia Systems.

[3]  Vasile Rus,et al.  High Performance Logic Form Transformation , 2002, Int. J. Artif. Intell. Tools.

[4]  Yue-Shi Lee,et al.  CLVQ: cross-language video question/answering system , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[5]  Ellen M. Voorhees,et al.  The Tenth Text REtrieval Conference, TREC 2001 | NIST , 2002 .

[6]  Michael R. Lyu,et al.  A comprehensive method for multilingual video text detection, localization, and extraction , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Jimmy J. Lin,et al.  The role of context in question answering systems , 2003, CHI Extended Abstracts.

[8]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[9]  J. Cao,et al.  Question answering on lecture videos: a multifaceted approach , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[10]  Jungyun Seo,et al.  SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP , 2001, TREC.

[11]  Hsin-Hsi Chen,et al.  A Simple Method for Chinese Video OCR and Its Application to Question Answering , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[12]  Ellen M. Voorhees,et al.  Overview of the TREC-9 Question Answering Track , 2000, TREC.

[13]  Jianping Fan,et al.  Automatic image segmentation by integrating color-edge extraction and seeded region growing , 2001, IEEE Trans. Image Process..

[14]  Vasile Rus High precision logic form transformation , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[15]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[16]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[17]  Jimmy J. Lin,et al.  What Makes a Good Answer? The Role of Context in Question Answering , 2003, INTERACT.

[18]  Sargur N. Srihari,et al.  The design of a nearest-neighbor classifier and its use for Japanese character recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[19]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[20]  Sanda M. Harabagiu,et al.  High performance question/answering , 2001, SIGIR '01.

[21]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[22]  Jay F. Nunamaker,et al.  A natural language approach to content-based video indexing and retrieval for interactive e-learning , 2004, IEEE Transactions on Multimedia.

[23]  Jacques Savoy,et al.  Comparative study of monolingual and multilingual search models for use with asian languages , 2005, TALIP.

[24]  Michael R. Lyu,et al.  A new approach for video text detection , 2002, Proceedings. International Conference on Image Processing.

[25]  Grace Hui Yang,et al.  Structured use of external knowledge for event-based open domain question answering , 2003, SIGIR.