Semantic Bootstrapping: A Theoretical Perspective

Knowledge acquisition is an iterative process. Most prior work used syntactic bootstrapping approaches, while semantic bootstrapping was proposed recently. Unlike syntactic bootstrapping, semantic bootstrapping bootstraps directly on knowledge rather than on syntactic patterns, that is, it uses existing knowledge to understand the text and acquire more knowledge. It has been shown that semantic bootstrapping can achieve superb precision while retaining good recall on extracting isA relation. Nonetheless, the working mechanism of semantc bootstrapping remains elusive. In this extended abstract, we present a theoretical analysis as well as an experimental study to provide deeper insights into semantic bootstrapping.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Siddharth Patwardhan,et al.  Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions , 2007, EMNLP.

[3]  Rajeev Motwani,et al.  Towards estimation error guarantees for distinct values , 2000, PODS.

[4]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[5]  Ashwin Machanavajjhala,et al.  An Analysis of Structured Data on the Web , 2012, Proc. VLDB Endow..

[6]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[7]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[8]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[9]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[10]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[11]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[12]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[13]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[14]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[15]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[16]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[17]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[18]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[19]  Philipp Cimiano,et al.  Harvesting Relations from the Web - Quantifiying the Impact of Filtering Functions , 2007, AAAI.

[20]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[21]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[22]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[23]  Erik Cambria,et al.  Sentic Computing: Techniques, Tools, and Applications , 2012 .

[24]  Andreas Harth,et al.  Towards Semantically-Interlinked Online Communities , 2005, ESWC.

[25]  Yue Wang,et al.  Concept-Based Web Search , 2012, ER.

[26]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[27]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[28]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[29]  Haixun Wang,et al.  Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.