Parse Thicket Representation for Multi-sentence Search

We develop a graph representation and learning technique for parse structures for sentences and paragraphs of text. This technique is used to improve relevance answering complex questions where an answer is included in multiple sentences. We introduce Parse Thicket as a sum of syntactic parse trees augmented by a number of arcs for inter-sentence word-word relations such as coreference and taxonomic. These arcs are also derived from other sources, including Rhetoric Structure theory, and respective indexing rules are introduced, which identify inter-sentence relations and joins phrases connected by these relations in the search index. Generalization of syntactic parse trees (as a similarity measure between sentences) is defined as a set of maximum common sub-trees for two parse trees. Generalization of a pair of parse thickets to measure relevance of a question and an answer, distributed in multiple sentences, is defined as a set of maximal common sub-parse thickets. The proposed approach is evaluated in the product search domain of eBay.com, where user query includes product names, features and expressions for user needs, and the query keywords occur in different sentences of text. We demonstrate that search relevance is improved by single sentence-level generalization, and further increased by parse thicket generalization. The proposed approach is evaluated in the product search domain of eBay.com, where user query includes product names, features and expressions for user needs, and the query keywords occur in different sentences of text.

[1]  Ido Dagan,et al.  Semantic Inference at the Lexical-Syntactic Level , 2007, AAAI.

[2]  Dan Roth,et al.  The Necessity of Syntactic Parsing for Semantic Role Labeling , 2005, IJCAI.

[3]  Dietrich Rebholz-Schuhmann,et al.  MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline , 2008, Bioinform..

[4]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[5]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[6]  Josep Lluís de la Rosa i Esteva,et al.  Using Generalization of Syntactic Parse Trees for Taxonomy Capture on the Web , 2011, ICCS.

[7]  Neil D. Lawrence,et al.  Missing Data in Kernel PCA , 2006, ECML.

[8]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[9]  Gordon Plotkin,et al.  A Note on Inductive Generalization , 2008 .

[10]  Shafiq R. Joty,et al.  Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels , 2011, Inf. Process. Manag..

[11]  Alex S. Taylor,et al.  Machine intelligence , 2009, CHI.

[12]  Josep Lluís de la Rosa i Esteva,et al.  Inferring the semantic properties of sentences by mining syntactic parse trees , 2012, Data Knowl. Eng..

[13]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[14]  Carlos Iván Chesñevar,et al.  A novel approach for classifying customer complaints through graphs similarities in argumentative dialogues , 2009, Decis. Support Syst..

[15]  Amit P. Sheth,et al.  Context-Aware Semantic Association Ranking , 2003, SWDB.

[16]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[17]  Dan Roth,et al.  Mapping Dependencies Trees: An Application to Question Answering , 2003 .

[18]  William C. Mann,et al.  Rhetorical structure theory and text analysis , 1989 .

[19]  Boris A. Galitsky Natural language question answering system : technique of semantic headers , 2003 .

[20]  Sanjiv Kapoor,et al.  Algorithms for Enumerating All Spanning Trees of Undirected and Weighted Graphs , 1995, SIAM J. Comput..