Building large knowledge bases by mass collaboration

Acquiring knowledge has long been the major bottleneck preventing the rapid spread of AI systems. Manual approaches are slow and costly. Machine-learning approaches have limitations in the depth and breadth of knowledge they can acquire. The spread of the Internet has made possible a third solution: building knowledge bases by mass collaboration, with thousands of volunteers contributing simultaneously. While this approach promises large improvements in the speed and cost of knowledge base development, it can only succeed if the problem of ensuring the quality, relevance and consistency of the knowledge is addressed, if contributors are properly motivated, and if the underlying algorithms scale. In this paper we propose an architecture that meets all these desiderata. It uses first-order probabilistic reasoning techniques to combine potentially inconsistent knowledge sources of varying quality, and it uses machine-learning techniques to estimate the quality of knowledge. We evaluate the approach using a series of synthetic knowledge bases and a pilot study in the domain of printer troubleshooting.

[1]  Matthew Richardson,et al.  Learning with Knowledge from Multiple Experts , 2003, ICML.

[2]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Deborah L. McGuinness,et al.  An Environment for Merging and Testing Large Ontologies , 2000, KR.

[5]  David G. Stork,et al.  Using Open Data Collection for Intelligent Software , 2000, Computer.

[6]  Matthew Richardson,et al.  Trust Management for the Semantic Web , 2003, SEMWEB.

[7]  Push Singh,et al.  The Public Acquisition of Commonsense Knowledge , 2002 .

[8]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[9]  De Raedt,et al.  Advances in Inductive Logic Programming , 1996 .

[10]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[11]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[12]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[13]  Avi Pfeffer,et al.  Learning Probabilities for Noisy First-Order Rules , 1997, IJCAI.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  S. Muggleton Stochastic Logic Programs , 1996 .

[16]  Peter Haddawy,et al.  Answering Queries from Context-Sensitive Probabilistic Knowledge Bases , 1997, Theor. Comput. Sci..

[17]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[18]  M. Pazzani,et al.  The Utility of Knowledge in Inductive Learning , 1992, Machine Learning.

[19]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[20]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[21]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[22]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[23]  Luc De Raedt,et al.  Bayesian Logic Programs , 2001, ILP Work-in-progress reports.

[24]  Robert P. Goldman,et al.  From knowledge bases to decision models , 1992, The Knowledge Engineering Review.