Domain Specific Facts Extraction Using Weakly Supervised Active Learning Approach

An ontology is defined using concepts and relationships between the concepts. In this paper, we focus on second problem: relation extraction from plain text. Generic Knowledge Bases like YAGO, Freebase, and DBPedia have made accessible huge collections of facts and their properties from various domains. But acquiring and maintaining various facts and their relations from domain specific corpus becomes very important and challenging task due to low availability of annotated data. Here, we proposed a label propagation based semi-supervised approach for relation extraction by choosing most informative instances for annotation. We also proposed weakly supervised approach for data annotation using generic ontologies like Freebase, which further reduces the cost of annotating data manually. We checked efficiency of our approach by performing experiments on various domain specific corpora.

[1]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[2]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[3]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[6]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[7]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[8]  Gerhard Weikum,et al.  Harvesting facts from textual web sources by constrained label propagation , 2011, CIKM '11.

[9]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[10]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[11]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[12]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[13]  Gerhard Weikum,et al.  PRAVDA-live: interactive knowledge harvesting , 2012, CIKM '12.

[14]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[15]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[16]  Cane Wing-ki Leung,et al.  Unsupervised Information Extraction with Distributional Prior Knowledge , 2011, EMNLP.

[17]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[18]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[19]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[20]  Danushka Bollegala,et al.  Relation Adaptation: Learning to Extract Novel Relations with Minimum Supervision , 2011, IJCAI.

[21]  Guodong Zhou,et al.  Tree kernel-based semantic relation extraction with rich syntactic and semantic information , 2010, Inf. Sci..

[22]  Eugene Agichtein,et al.  Mining reference tables for automatic text segmentation , 2004, KDD.

[23]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.