Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data

Implementing the multilingual Semantic Web vision requires transforming unstructured data in multiple languages from the Document Web into structured data for the multilingual Web of Data. We present the multilingual version of FOX, a knowledge extraction suite which supports this migration by providing named entity recognition based on ensemble learning for five languages. Our evaluation results show that our approach goes beyond the performance of existing named entity recognition systems on all five languages. In our best run, we outperform the state of the art by a gain of 32.38% F1-Score points on a Dutch dataset. More information and a demo can be found at http://fox.aksw.org as well as an extended version of the paper descriping the evaluation in detail.

[1]  Christine Thielen,et al.  An Approach to Proper Name Tagging for German , 1995, cmp-lg/9506024.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Axel-Cyrille Ngonga Ngomo,et al.  Named Entity Recognition using FOX , 2014, International Semantic Web Conference.

[5]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[6]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[7]  Ted Kwartler The OpenNLP Project , 2017 .

[8]  Muhammad Saleem,et al.  Requirements to Modern Semantic Search Engines , 2016 .

[9]  Ali Khalili,et al.  conTEXT - Lightweight Text Analytics Using Linked Data , 2014, ESWC.

[10]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[11]  Muhammad Saleem,et al.  Requirements to Modern Semantic Search Engine , 2016, KESW.

[12]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[13]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[14]  Axel-Cyrille Ngonga Ngomo,et al.  HAWK - Hybrid Question Answering over Linked Data , 2015, ESWC 2015.

[15]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[16]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[17]  Axel-Cyrille Ngonga Ngomo,et al.  SCMS - Semantifying Content Management Systems , 2011, SEMWEB.

[18]  Robert A. Amsler,et al.  Research toward the development of a lexical knowledge base for natural language processing , 1989, SIGIR '89.

[19]  Sam Coates-Stephens,et al.  The Analysis and Acquisition of Proper Names for the Understanding of Free Text , 1992, Comput. Humanit..

[20]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[21]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[22]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[23]  Christian Biemann,et al.  GermaNER: Free Open German Named Entity Recognition Tool , 2015, GSCL.

[24]  Ralf Steinberger,et al.  JRC-Names: Multilingual entity name variants and titles as Linked Data , 2016, Semantic Web.

[25]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[26]  Jens Lehmann,et al.  Automating RDF Dataset Transformation and Enrichment , 2015, ESWC.

[27]  Manaal Faruqui,et al.  Training and Evaluating a German Named Entity Recognizer with Semantic Generalization , 2010, KONVENS.

[28]  Axel-Cyrille Ngonga Ngomo,et al.  Ensemble Learning for Named Entity Recognition , 2014, SEMWEB.

[29]  Axel-Cyrille Ngonga Ngomo,et al.  ASSESS - Automatic Self-Assessment Using Linked Data , 2015, SEMWEB.

[30]  Axel-Cyrille Ngonga Ngomo,et al.  Open Knowledge Extraction Challenge 2017 , 2017, SemWebEval@ESWC.

[31]  Andrea Giovanni Nuzzolese,et al.  Open Knowledge Extraction Challenge , 2015, SemWebEval@ESWC.

[32]  David Nadeau,et al.  Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision , 2007 .

[33]  Axel-Cyrille Ngonga Ngomo,et al.  MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach , 2017, K-CAP.

[34]  Geoffrey Sampson,et al.  How Fully Does a Machine-Usable Dictionary Cover English Text? , 1989 .

[35]  Axel-Cyrille Ngonga Ngomo,et al.  From RDF to Natural Language and Back , 2014, Towards the Multilingual Semantic Web.

[36]  Axel-Cyrille Ngonga Ngomo,et al.  CETUS - A Baseline Approach to Type Extraction , 2015, SemWebEval@ESWC.

[37]  Axel-Cyrille Ngonga Ngomo,et al.  HAWK - Hybrid Question Answering Using Linked Data , 2015, ESWC.

[38]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[39]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..