Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries

Much recent work focuses on formal interpretation of natural question utterances, with the goal of executing the resulting structured queries on knowledge graphs (KGs) such as Freebase. Here we address two limitations of this approach when applied to open-domain, entity-oriented Web queries. First, Web queries are rarely wellformed questions. They are “telegraphic”, with missing verbs, prepositions, clauses, case and phrase clues. Second, the KG is always incomplete, unable to directly answer many queries. We propose a novel technique to segment a telegraphic query and assign a coarse-grained purpose to each segment: a base entity e1, a relation type r, a target entity type t2, and contextual words s. The query seeks entity e2 2 t2 where r(e1,e2) holds, further evidenced by schema-agnostic words s. Query segmentation is integrated with the KG and an unstructured corpus where mentions of entities have been linked to the KG. We do not trust the best or any specific query segmentation. Instead, evidence in favor of candidate e2s are aggregated across several segmentations. Extensive experiments on the ClueWeb corpus and parts of Freebase as our KG, using over a thousand telegraphic queries adapted from TREC, INEX, and WebQuestions, show the efficacy of our approach. For one benchmark, MAP improves from 0.2‐0.29 (competitive baselines) to 0.42 (our system). NDCG@10 improves from 0.29‐0.36 to 0.54.

[1]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[2]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[3]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[4]  Sebastiano Vigna,et al.  MG4J at TREC 2005 , 2005, TREC.

[5]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[6]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Kristin P. Bennett,et al.  Multiple instance ranking , 2008, ICML '08.

[8]  ChengXiang Zhai,et al.  Statistical Language Models for Information Retrieval: A Critical Review , 2008, Found. Trends Inf. Retr..

[9]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[10]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[11]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[12]  Kevin Chen-Chuan Chang,et al.  Beyond pages: supporting efficient, scalable entity search with dual-inversion index , 2010, EDBT '10.

[13]  Cong Yu,et al.  EntityEngine: answering entity-relationship queries using shallow semantics , 2010, CIKM '10.

[14]  Panayiotis Tsaparas,et al.  Structured annotations of web queries , 2010, SIGMOD Conference.

[15]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[16]  ChengXiang Zhai,et al.  Unsupervised query segmentation using clickthrough for information retrieval , 2011, SIGIR '11.

[17]  Ihab F. Ilyas,et al.  Interpreting keyword queries over web knowledge bases , 2012, CIKM '12.

[18]  Michael Gamon,et al.  Active objects: actions for entity-centric search , 2012, WWW.

[19]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[20]  Gerhard Weikum,et al.  Natural Language Questions for the Web of Data , 2012, EMNLP.

[21]  Michael Gamon,et al.  Mining Entity Types from Query Logs via User Intent Modeling , 2012, ACL.

[22]  Percy Liang,et al.  Lambda Dependency-Based Compositional Semantics , 2013, ArXiv.

[23]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[24]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[25]  Soumen Chakrabarti,et al.  Learning joint query interpretation and response ranking , 2013, WWW '13.

[26]  Xuchen Yao,et al.  Information Extraction over Structured Data: Question Answering with Freebase , 2014, ACL.

[27]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[28]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[29]  Xuchen Yao,et al.  Freebase QA: Information Extraction or Semantic Parsing? , 2014, ACL 2014.

[30]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.