Which Noun Phrases Denote Which Concepts?

Resolving polysemy and synonymy is required for high-quality information extraction. We present ConceptResolver, a component for the Never-Ending Language Learner (NELL) (Carlson et al., 2010) that handles both phenomena by identifying the latent concepts that noun phrases refer to. ConceptResolver performs both word sense induction and synonym resolution on relations extracted from text using an ontology and a small amount of labeled data. Domain knowledge (the ontology) guides concept creation by defining a set of possible semantic types for concepts. Word sense induction is performed by inferring a set of semantic types for each noun phrase. Synonym detection exploits redundant information to train several domain-specific synonym classifiers in a semi-supervised fashion. When ConceptResolver is run on NELL's knowledge base, 87% of the word senses it creates correspond to real-world concepts, and 85% of noun phrases that it suggests refer to the same concept are indeed synonyms.

[1]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[4]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[5]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[6]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[7]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[8]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[9]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.

[10]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[12]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[13]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[14]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[15]  Pradeep Ravikumar,et al.  A Hierarchical Graphical Model for Record Linkage , 2004, UAI.

[16]  Charles Elkan,et al.  The Field Matching Problem: Algorithms and Applications , 1996, KDD.

[17]  Vincent Ng,et al.  Unsupervised Models for Coreference Resolution , 2008, EMNLP.

[18]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[19]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[20]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[21]  Eneko Agirre,et al.  Semeval-2007 Task 2 : Evaluating Word Sense Induction and Discrimination , 2007 .

[22]  Abraham Kaplan,et al.  An experimental study of ambiguity and context , 1955, Mech. Transl. Comput. Linguistics.

[23]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[24]  Oren Etzioni,et al.  Unsupervised Resolution of Objects and Relations on the Web , 2007, NAACL.

[25]  Dan Klein,et al.  Coreference Resolution in a Modular, Entity-Centered Model , 2010, NAACL.

[26]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[27]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[28]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[29]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[30]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[31]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[32]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[33]  Daniel Jurafsky,et al.  Learning to Merge Word Senses , 2007, EMNLP.

[34]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).