Using a knowledge graph and query click logs for unsupervised learning of relation detection

In this paper, we introduce a novel statistical language understanding paradigm inspired by the emerging semantic web: Instead of building models for the target application, we propose relying on the semantic space already defined and populated in the knowledge graph for the target domain. As a first step towards this direction, we present unsupervised methods for training relation detection models exploiting the semantic knowledge graphs of the semantic web. The detected relations are used to mine natural language queries against a back-end knowledge base. For each relation, we leverage the complete set of entities that are connected to each other in the graph with the specific relation, and search these entity pairs on the web. We use the snippets that the search engine returns to create natural language examples that can be used as the training data for each relation. We further refine the annotations of these examples using the knowledge graph itself and iterate using a bootstrap approach. Furthermore, we explot the URLs returned for these pairs by the search engine to mine additional examples from the search engine query click logs. In our experiments, we show that, we can achieve relation detection models that perform about 60% macro F-measure on the relations that are in the knowledge graph without any manual labeling, resulting in a comparable performance with supervised training.

[1]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[2]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[3]  Gökhan Tür,et al.  Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing , 2012, INTERSPEECH.

[4]  Susan Steele Mark Steedman, Surface structure and interpretation ( Linguistic Inquiry Monographs 30). Cambridge, MA: MIT Press, 1996. Pp. xiv+126. , 1998 .

[5]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[6]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[7]  Gökhan Tür,et al.  Exploiting query click logs for utterance domain detection in spoken language understanding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[9]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[10]  Tran Cao Son,et al.  Semantic Web Services , 2001, IEEE Intell. Syst..

[11]  Dan Klein,et al.  Learning and Inference for Hierarchically Split PCFGs , 2007, AAAI.

[12]  Gökhan Tür,et al.  Translating natural language utterances to search queries for SLU domain detection using query click logs , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Dilek Z. Hakkani-Tür,et al.  Exploiting the Semantic Web for unsupervised spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[14]  Mark Steedman,et al.  Surface structure and interpretation , 1996, Linguistic inquiry.

[15]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[16]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[17]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[18]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[19]  Michael Gamon,et al.  Active objects: actions for entity-centric search , 2012, WWW.

[20]  Tom M. Mitchell,et al.  Weakly Supervised Training of Semantic Parsers , 2012, EMNLP.