An Analysis of Knowledge Collected from Volunteer Contributors

A new generation of intelligent applications can be enabled by broad-coverage repositories of knowledge. One emerging approach to constructing such repositories is proactive knowledge collection from large numbers of volunteer contributors. In this paper, we study the coverage and quality of a representative collection of part-of information contributed by volunteers. We analyze growth of coverage over time, redundancy of the collected knowledge, and the effect of the coverage and redundancy on the quality of the collection. We also present initial comparisons with collections created by ontology engineering and text extraction approaches. Our analysis reveals that redundancy of contribution helps identify high quality statements, but that some of the statements also have overly high redundancy, drawing contributor effort away from areas where they are needed more. We suggest possible ways to address these issues in future collection efforts.

[1]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[2]  Rakesh Gupta,et al.  Common Sense Data Acquisition for Indoor Mobile Robots , 2004, AAAI.

[3]  Lenhart K. Schubert Can we derive general world knowledge from texts , 2002 .

[4]  Timothy Chklovski,et al.  Learner: a system for acquiring commonsense knowledge by analogy , 2003, K-CAP '03.

[5]  Jon Curtis,et al.  Representing Knowledge Gaps Effectively , 2004, PAKM.

[6]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[7]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[8]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[9]  Matthew Richardson,et al.  Building large knowledge bases by mass collaboration , 2003, K-CAP '03.

[10]  George A. Miller,et al.  Nouns in WordNet: A Lexical Inheritance System , 1990 .

[11]  David G. Stork,et al.  Evaluating Classifiers by Means of Test Data with Noisy Labels , 2003, IJCAI.

[12]  Douglas Herrmann,et al.  A Taxonomy of Part-Whole Relations , 1987, Cogn. Sci..

[13]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[14]  Henry Lieberman,et al.  Beating Common Sense into Interactive Applications , 2004, AI Mag..

[15]  Timothy Chklovski,et al.  Using analogy to acquire commonsense knowledge from human Contributors , 2003 .

[16]  Timothy Chklovski,et al.  Designing interfaces for guided collection of knowledge about everyday objects from volunteers , 2005, IUI.

[17]  Steffen Staab,et al.  Project Halo: Towards a Digital Aristotle , 2004, AI Mag..

[18]  Rada Mihalcea,et al.  Building sense tagged corpora with volunteer contributions over the Web , 2003, RANLP.

[19]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[20]  Dan I. Moldovan,et al.  Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations , 2003, NAACL.

[21]  Nicola Guarino,et al.  Sweetening WORDNET with DOLCE , 2003, AI Mag..

[22]  P. Pantel,et al.  Path Analysis for Refining Verb Relations , 2004 .