RICH-CPL: Fact Extraction from Wikipedia-sized Corpora for Morphologically Rich Languages

This work deals with never-ending learning approach for fact extraction from unstructured Russian text. It continues the research in the field of pattern learning techniques for morphologically rich free-word-order language. We introduce improvements for CPL-RUS algorithm and choose best initial parameters. We conducted experiments with the extended version, RICH-CPL algorithm on the corpus containing over 1.3 million pages. This paper is shortened version of our paper [7] that includes also new modifications of the proposed methods.

[1]  Artem Lukanin,et al.  Automatic Extraction of Hypernyms and Hyponyms from Russian Texts , 2014, AIST.

[2]  Estevam R. Hruschka,et al.  How to read the web in portuguese using the never-ending language learner's principles , 2014, 2014 14th International Conference on Intelligent Systems Design and Applications.

[3]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[4]  R GruberThomas Toward principles for the design of ontologies used for knowledge sharing , 1995 .

[5]  Andrey Filchenkov,et al.  Towards Never Ending Language Learning for Morphologically Rich Languages , 2017, BSNLP@EACL.

[6]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[7]  Heng Ji,et al.  Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[8]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[9]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[10]  Dmitry Ustalov,et al.  YARN: Spinning-in-Progress , 2016, GWC.

[11]  Eva Blomqvist,et al.  Describing Ontology Applications , 2007, ESWC.

[12]  Elizabeth Chang,et al.  Semi-Automatic Ontology Extension Using Spreading Activation , 2005 .

[13]  Artem Kuznetsov,et al.  Family Matters: Company Relations Extraction from Wikipedia , 2016, KESW.

[14]  Svetlana Alexeeva,et al.  FactRuEval 2016: Evaluation of Named Entity Recognition and Fact Extraction Systems for Russian , 2016 .

[15]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[16]  Carola Eschenbach,et al.  Formal Ontology in Information Systems , 2008 .

[17]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[18]  N. Guarino,et al.  Formal Ontology in Information Systems : Proceedings of the First International Conference(FOIS'98), June 6-8, Trento, Italy , 1998 .

[19]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[20]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[21]  Natalia V. Loukachevitch,et al.  RuThes Linguistic Ontology vs. Russian Wordnets , 2014, GWC.