A First Experimental Demonstration of Massive Knowledge Infusion

A central goal of Artificial Intelligence is to create systems that embody commonsense knowledge in a reliable enough form that it can be used for reasoning in novel situations. Knowledge Infusion is an approach to this problem in which the commonsense knowledge is acquired by learning. In this paper we report on experiments on a corpus of a half million sentences of natural language text that test whether commonsense knowledge can be usefully acquired through this approach. We examine the task of predicting a deleted word from the remainder of a sentence for some 268 target words. As baseline we consider how well this task can be performed using learned rules based on the words within a fixed distance of the target word and their parts of speech. This captures an approach that has been previously demonstrated to be highly successful for a variety of natural language tasks. We then go on to learn from the corpus rules that embody commonsense knowledge, additional to the knowledge used in the baseline case. We show that chaining learned commonsense rules together leads to measurable improvements in prediction performance on our task as compared with the baseline. This is apparently the first experimental demonstration that commonsense knowledge can be learned from natural inputs on a massive scale reliably enough that chaining the learned rules is efficacious for reasoning.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Peter Clark,et al.  Recognizing Textual Entailment with Logical Inference , 2008, TAC.

[3]  Dan Roth,et al.  An Inference Model for Semantic Entailment in Natural Language , 2005, IJCAI.

[4]  David J. Stracuzzi SCALABLE KNOWLEDGE ACQUISITION THROUGH MEMORY ORGANIZATION , 2005 .

[5]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[6]  Eric Brill,et al.  A corpus-based approach to language learning , 1993 .

[7]  Dan Roth,et al.  Learning and Inference over Constrained Output , 2005, IJCAI.

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[9]  Leslie G. Valiant,et al.  Robust logics , 1999, STOC '99.

[10]  Ashwin Srinivasan,et al.  ILP: A Short Look Back and a Longer Look Forward , 2003, J. Mach. Learn. Res..

[11]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[12]  Andrew Hickl,et al.  Recognizing Textual Entailment with LCC’s G ROUNDHOG System , 2005 .

[13]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[14]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[15]  Oren Etzioni,et al.  Machine Reading , 2006, AAAI.

[16]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[17]  Leslie G. Valiant,et al.  Knowledge Infusion , 2006, AAAI.

[18]  Ashwin Srinivasan,et al.  Word Sense Disambiguation Using Inductive Logic Programming , 2007, ILP.

[19]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[20]  Dan Roth,et al.  Relational Learning via Propositional Algorithms: An Information Extraction Case Study , 2001, IJCAI.

[21]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[22]  Johan Bos,et al.  Recognising Textual Entailment with Logical Inference , 2005, HLT.

[23]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[24]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[25]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[26]  David G. Stork,et al.  Building intelligent systems one e-citizen at a time , 1999, IEEE Intell. Syst..

[27]  Eric Brill,et al.  Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[28]  Maria Liakata,et al.  Inducing domain theories , 2004 .

[29]  Dan Roth,et al.  A Classification Approach to Word Prediction , 2000, ANLP.

[30]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).