Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback

We envision an automatic knowledge base construction system consisting of three inter-related components. MADden is a knowledge extraction system applying statistical text analysis methods over database systems (DBMS) and massive parallel processing (MPP) frameworks; ProbKB performs probabilistic reasoning over the extracted knowledge to derive additional facts not existing in the original text corpus; CAMeL leverages human intelligence to reduce the uncertainty resulting from both the information extraction and probabilistic reasoning processes.

[1]  Luis Gravano,et al.  Using q-grams in a DBMS for Approximate String Processing , 2001, IEEE Data Eng. Bull..

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[4]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[5]  Daisy Zhe Wang,et al.  Querying probabilistic information extraction , 2010, Proc. VLDB Endow..

[6]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[7]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[8]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[9]  Oren Etzioni,et al.  Scaling Textual Inference to the Web , 2008, EMNLP.

[10]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[11]  Lise Getoor,et al.  PrDB: managing and exploiting rich correlations in probabilistic databases , 2009, The VLDB Journal.

[12]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[13]  Jimmy J. Lin,et al.  CrowdFlow : Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility , 2010 .

[14]  Subramanian Arumugam,et al.  The DataPath system: a data-centric analytic processing engine for large data warehouses , 2010, SIGMOD Conference.

[15]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[16]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[18]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[19]  Daisy Zhe Wang,et al.  Hybrid in-database inference for declarative information extraction , 2011, SIGMOD '11.