Extracting concept descriptions from the Web: the importance of attributes and values

When extracting information about concepts from the Web, the problem is not recall, but precision: trying to identify which properties of a concept are genuinely distinctive. We discuss a series of experiments in empirical ontology using both unsupervised and supervised methods, showing that not all semantic relations we can extract from text are equally useful, and suggesting that attempting to identify concept attributes (parts, qualities, and the like) and their values results in better concept descriptions than those obtained by being less selective.

[1]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[2]  Hinrich Schütze,et al.  Ambiguity resolution in language learning , 1997 .

[3]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[4]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[5]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[6]  Massimo Poesio,et al.  Acquiring Lexical Knowledge for Anaphora Resolution , 2002, LREC.

[7]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[8]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[9]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[10]  Nicholas V. Findler,et al.  Associative Networks- Representation and Use of Knowledge by Computers , 1980, CL.

[11]  Philipp Cimiano,et al.  Automatically Learning Qualia Structures from the Web , 2005, ACL 2005.

[12]  Massimo Poesio,et al.  Attribute-Based and Value-Based Clustering: An Evaluation , 2004, EMNLP.

[13]  Massimo Poesio,et al.  Concept Learning and Categorization from the Web , 2005 .

[14]  Nicola Guarino,et al.  Concepts, attributes and arbitrary relations , 1992, Data Knowl. Eng..

[15]  Robert M. Dixon,et al.  A new approach to Eng-lish Grammar on semantic principles , 1991 .

[16]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[17]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[18]  James R. Curran,et al.  Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[19]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[20]  Massimo Poesio,et al.  Identifying Concept Attributes Using a Classifier , 2005, ACL 2005.

[21]  Vasileios Hatzivassiloglou,et al.  Towards the Automatic Identification of Adjectival Scales: Clustering Adjectives According to Meaning , 1993, ACL.

[22]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[23]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[24]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[25]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[26]  Abdulrahman Almuhareb,et al.  Attributes in lexical acquisition , 2006 .

[27]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[28]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[29]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[30]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[31]  Hector,et al.  Fundamental Tradeoff in Knowledge Representation and Reasoning ( Revised Versionl ) , 2001 .

[32]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[33]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[34]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[35]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[36]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[37]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[38]  Ronald J. Brachman,et al.  ON THE EPISTEMOLOGICAL STATUS OF SEMANTIC NETWORKS , 1979 .