论文信息 - Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback

Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback

We envision an automatic knowledge base construction system consisting of three inter-related components. MADden is a knowledge extraction system applying statistical text analysis methods over database systems (DBMS) and massive parallel processing (MPP) frameworks; ProbKB performs probabilistic reasoning over the extracted knowledge to derive additional facts not existing in the original text corpus; CAMeL leverages human intelligence to reduce the uncertainty resulting from both the information extraction and probabilistic reasoning processes.

[1] Luis Gravano,et al. Using q-grams in a DBMS for Approximate String Processing , 2001, IEEE Data Eng. Bull..

[2] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3] Arthur P. Dempster,et al. Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[4] Ronen Feldman,et al. Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[5] Daisy Zhe Wang,et al. Querying probabilistic information extraction , 2010, Proc. VLDB Endow..

[6] Christopher Ré,et al. Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[7] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[8] Glenn Shafer,et al. A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[9] Oren Etzioni,et al. Scaling Textual Inference to the Web , 2008, EMNLP.

[10] Daisy Zhe Wang,et al. BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[11] Lise Getoor,et al. PrDB: managing and exploiting rich correlations in probabilistic databases , 2009, The VLDB Journal.

[12] Panagiotis G. Ipeirotis. Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[13] Jimmy J. Lin,et al. CrowdFlow : Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility , 2010 .

[14] Subramanian Arumugam,et al. The DataPath system: a data-centric analytic processing engine for large data warehouses , 2010, SIGMOD Conference.

[15] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[16] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[18] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.

[19] Daisy Zhe Wang,et al. Hybrid in-database inference for declarative information extraction , 2011, SIGMOD '11.