Exploiting syntactic relations for question answering

Recently there has been a resurgent interest in syntax-based approaches to information access, as a means of overcoming the limitations of keyword-based approaches. So far attempts to use syntax have been ad hoc, choosing to use some syntactic information but still ignoring most of the tree structure. This thesis describes the design and implementation of SMARTQA, a proof-of-concept question answering system that compares syntactic trees in a principled manner. Specifically, SMARTQA uses a tree edit-distance algorithm to calculate the similarity between unordered, unrooted syntactic trees. The general case of this problem is NP-complete; in practice, SMARTQA demonstrates that an optimized implementation of the algorithm can be feasibly used for question answering applications. Thesis Supervisor: Boris Katz Title: Principal Research Scientist

[1]  Daniel Jurafsky,et al.  Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[2]  Liliane Haegeman,et al.  English Grammar: A Generative Perspective , 1998 .

[3]  Julian Kupiec,et al.  MURAX: a robust linguistic approach for question answering using an on-line encyclopedia , 1993, SIGIR.

[4]  Boris Katz,et al.  Using English for Indexing and Retrieving , 1991 .

[5]  Kenneth C. Litkowski,et al.  Syntactic Clues and Lexical Resources in Question-Answering , 2000, TREC.

[6]  Kaizhong Zhang,et al.  The editing distance between trees: Algorithms and applications , 1989 .

[7]  Boris Katz,et al.  Parsing and Generating English Using Commutative Transformations. , 1982 .

[8]  Boris Katz,et al.  Using Semantic Overlap Scoring in Answering TREC Relationship Questions , 2006, LREC.

[9]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[10]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[11]  Erik D. Demaine,et al.  An O(n^3)-Time Algorithm for Tree Edit Distance , 2005, ArXiv.

[12]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[13]  Diego Mollá Aliod,et al.  Answerfinder: Question Answering by Combining Lexical, Syntactic and Semantic Information , 2004, ALTA.

[14]  Ted Briscoe,et al.  Corpus Annotation for Parser Evaluation , 1999, ArXiv.

[15]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[16]  Boris Katz,et al.  Exploiting Lexical Regularities in Designing Natural Language Systems , 1988, COLING.

[17]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[18]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[19]  Kenneth C. Litkowski,et al.  Use of Metadata for Question Answering and Novelty Tasks , 2003, TREC.

[20]  Antonio Cisternino,et al.  PiQASso: Pisa Question Answering System , 2001, TREC.

[21]  Carsten Lund,et al.  Proof verification and the hardness of approximation problems , 1998, JACM.

[22]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[23]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[24]  D. G. Hays Dependency Theory: A Formalism and Some Observations , 1964 .

[25]  Jimmy J. Lin,et al.  Question answering from the web using knowledge annotation and knowledge mining techniques , 2003, CIKM '03.

[26]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[27]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[28]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[29]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[30]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[31]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[32]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[33]  Dekang Lin,et al.  PRINCIPAR - An Efficient, Broad-coverage, Principle-based Parser , 1994, COLING.

[34]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[35]  Martin M. Soubbotin Patterns of Potential Answer Expressions as Clues to the Right Answers , 2001, TREC.

[36]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[37]  Boris Katz,et al.  Annotating the World Wide Web using Natural Language , 1997, RIAO.

[38]  Kenneth C. Litkowski Question-Answering Using Semantic Relation Triples , 1999, TREC.

[39]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[40]  Tao Jiang,et al.  Some MAX SNP-Hard Results Concerning Unordered Labeled Trees , 1994, Inf. Process. Lett..

[41]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[42]  Daniel Marcu,et al.  A Noisy-Channel Approach to Question Answering , 2003, ACL.

[43]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[44]  Dan Roth,et al.  Mapping Dependencies Trees: An Application to Question Answering , 2003 .