ASKNet: Creating and Evaluating Large Scale Integrated Semantic Networks

Extracting semantic information from multiple natural language sources and combining that information into a single unified resource is an important and fundamental goal for natural language processing. Large scale resources of this kind can be useful for a wide variety of tasks including question answering, word sense disambiguation and knowledge discovery. A single resource representing the information in multiple documents can provide significantly more semantic information than is available from the documents considered independently. In this paper we describe the ASKNet system, which extracts semantic information from a large number of English texts, and combines that information into a large scale semantic network using spreading activation based techniques. Evaluation of large-scale semantic networks is a difficult problem. In order to evaluate ASKNet we have developed a novel evaluation metric and applied it to networks created from randomly chosen DUC articles. The results are highly promising:almost 80% precision for the semantic core of the networks.

[1]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Lenhart K. Schubert,et al.  Extracting and evaluating general world knowledge from the Brown Corpus , 2003, HLT-NAACL 2003.

[4]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[5]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[6]  R. Schvaneveldt,et al.  Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. , 1971, Journal of experimental psychology.

[7]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[10]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[11]  Henry Lieberman,et al.  A commonsense approach to predictive text entry , 2004, CHI EA '04.

[12]  David Baxter,et al.  On the Effective Use of Cyc in a Question Answering System , 2005 .

[13]  Brian Harrington,et al.  ASKNet: Automated Semantic Knowledge Network , 2007, AAAI.

[14]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[15]  Uwe Reyle,et al.  From Discourse to Logic - Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory , 1993, Studies in linguistics and philosophy.

[16]  Michael J. Witbrock,et al.  An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[17]  Mark Steedman,et al.  Wide-Coverage Semantic Representations from a CCG Parser , 2004, COLING.

[18]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[19]  Lucy Vanderwende,et al.  Automatically Deriving Structured Knowledge Bases From On-Line Dictionaries , 1993 .

[20]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[21]  Lucy Vanderwende,et al.  MindNet: Acquiring and Structuring Semantic Information from Text , 1998, COLING-ACL.

[22]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[23]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.