Acquiring Human-like Feature-Based Conceptual Representations from Corpora

The automatic acquisition of feature-based conceptual representations from text corpora can be challenging, given the unconstrained nature of human-generated features. We examine large-scale extraction of concept-relation-feature triples and the utility of syntactic, semantic, and encyclopedic information in guiding this complex task. Methods traditionally employed do not investigate the full range of triples occurring in human-generated norms (e.g. flute produce sound), rather targeting concept-feature pairs (e.g. flute - sound) or triples involving specific relations (e.g. is-a, part-of). We introduce a novel method that extracts candidate triples (e.g. deer have antlers, flute produce sound) from parsed data and re-ranks them using semantic information. We apply this technique to Wikipedia and the British National Corpus and assess its accuracy in a variety of ways. Our work demonstrates the utility of external knowledge in guiding feature extraction, and suggests a number of avenues for future work.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Massimo Poesio,et al.  Concept Learning and Categorization from the Web , 2005 .

[4]  L. K. Tyler,et al.  Conceptual Structure and the Structure of Concepts: A Distributed Account of Category-Specific Deficits , 2000, Brain and Language.

[5]  Massimo Poesio,et al.  Strudel: A distributional semantic model based on properties and types , 2010 .

[6]  Anna Korhonen,et al.  Improving Verb Clustering with Automatically Acquired Selectional Preferences , 2009, EMNLP.

[7]  J. Rodd,et al.  Distinctiveness and correlation in conceptual structure: behavioral and computational studies. , 2004, Journal of experimental psychology. Learning, memory, and cognition.

[8]  Alessandro Lenci,et al.  Concepts and properties in word spaces , 2008 .

[9]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[10]  Ari Rappoport,et al.  Classification of Semantic Relationships between Nominals Using Pattern Clusters , 2008, ACL.

[11]  Mark S. Seidenberg,et al.  Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.

[12]  Ari Rappoport,et al.  Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions , 2008, ACL.

[13]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[14]  Patrick Pantel,et al.  Automatically Harvesting and Ontologizing Semantic Relations , 2008, Ontology Learning and Population.

[15]  Zoubin Ghahramani,et al.  Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering , 2009 .

[16]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[17]  Anna Korhonen,et al.  Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora , 2010, HLT-NAACL 2010.

[18]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[19]  Massimo Poesio,et al.  Strudel: A Corpus-Based Semantic Model Based on Properties and Types , 2010, Cogn. Sci..