Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels

The task of answering complex questions requires inferencing and synthesizing information from multiple documents that can be seen as a kind of topic-oriented, informative multi-document summarization. In generic summarization the stochastic, graph-based random walk method to compute the relative importance of textual units (i.e. sentences) is proved to be very successful. However, the major limitation of the TF^*IDF approach is that it only retains the frequency of the words and does not take into account the sequence, syntactic and semantic information. This paper presents the impact of syntactic and semantic information in the graph-based random walk method for answering complex questions. Initially, we apply tree kernel functions to perform the similarity measures between sentences in the random walk framework. Then, we extend our work further to incorporate the Extended String Subsequence Kernel (ESSK) to perform the task in a similar manner. Experimental results show the effectiveness of the use of kernels to include the syntactic and semantic information for this task.

[1]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[2]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[3]  Julio Gonzalo,et al.  An Empirical Study of Information Synthesis Task , 2004, ACL.

[4]  B. Magnini,et al.  Recognizing Textual Entailment with Tree Edit Distance Algorithms , 2005 .

[5]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[6]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[7]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[8]  Christopher D. Manning,et al.  Learning to recognize features of valid textual entailments , 2006, NAACL.

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.

[11]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[12]  Shafiq R. Joty,et al.  Improving the Performance of the Random Walk Model for Answering Complex Questions , 2008, ACL.

[13]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[14]  Shafiq R. Joty,et al.  UofL: Word Sense Disambiguation Using Lexical Cohesion , 2007, SemEval@ACL.

[15]  Dan Roth,et al.  Mapping Dependencies Trees: An Application to Question Answering , 2003 .

[16]  Jun Suzuki,et al.  Dependency-based Sentence Alignment for Multiple Document Summarization , 2004, COLING.

[17]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[18]  Shafiq R. Joty,et al.  Exploiting Syntactic and Shallow Semantic Kernels to Improve Random Walks for Complex Question Answering , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[19]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[20]  C. Fellbaum An Electronic Lexical Database , 1998 .

[21]  Roberto Basili,et al.  A Tree Kernel approach to Question and Answer Classification in Question Answering Systems , 2006, LREC.

[22]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[23]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[24]  Nitin Madnani,et al.  Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing , 2005 .

[25]  Jun-ichi Fukumoto,et al.  Automated Summarization Evaluation with Basic Elements. , 2006, LREC.

[26]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[27]  Vasudeva Varma,et al.  IIIT Hyderabad at DUC 2007 , 2007 .

[28]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[29]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[30]  Donna Harman,et al.  Information Processing and Management , 2022 .

[31]  Tetsuji Kuboyama,et al.  A generalization of Haussler's convolution kernel: mapping kernel , 2008, ICML.