Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory

Abstract Information Extraction has been widely used to extract information from text. Named Entity Recognition (NER) is one of the primary tasks of Information Extraction to extract entities such as person, location, and organization. Extraction from text collection is essential to obtain information from unstructured text. Moreover, Named Entity Recognition is part of ontology building, which is the main objective of this research. Ontology can be built on the basis of a collection of concepts and relation between concepts. Concepts in ontology usually consist of a group of entities and are obtained using Noun Phrase Extraction or Named Entity Recognition. Our main focus in this research is to extract concepts in Ontology Building automatically using Named Entity Recognition. In this paper, Named Entity Recognition was chosen as our approach due to the lack of results from the previous Noun Phrase Extraction works, which is not all nouns obtained are entities. Our proposed methodology for Named Entity Recognition is applying an end-to-end model using Bidirectional Long Short Term Memory (Bi-LSTM). Bi-LSTM is able to perform a sequence classification task by understanding the context of the input. Named Entity Recognition approaches in the previous study uses Part-of-Speech (POS) Tagging in the preprocessing phase by using other tools or models. This Part-of Speech is also used as a feature to improve the performance of Named Entity Recognition. Our proposed methodology provides an end-to-end system that can be used for both POS Tagging and Named Entity Recognition. By using our proposed end-to-end model, no additional tool is needed for Part-of-Speech Tagging. This the advantage of our model compared to other models. Experiments were conducted on news documents that were labeled with four types of entity classes and 35 types of part-of-speech. The target entities that we have extracted in this study are person, location, organization, and miscellaneous. We evaluated the performance of our model using F1-Score. We have achieved the best F1-Score for Part-of-Speech Tagging of 91.79% and Named Entity Recognition of 83.18%.

[1]  Chris Develder,et al.  Joint entity recognition and relation extraction as a multi-head selection problem , 2018, Expert Syst. Appl..

[2]  Gang Chen,et al.  Joint Extraction of Entities and Relations Using Reinforcement Learning and Deep Learning , 2017, Comput. Intell. Neurosci..

[3]  Guohong Fu Chinese Named Entity Recognition Using a Morpheme-Based Chunking Tagger , 2009, 2009 International Conference on Asian Language Processing.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  James R. Curran,et al.  A Sequence Labelling Approach to Quote Attribution , 2012, EMNLP.

[6]  Bor Hodošček,et al.  Constructing an ontology and database of Japanese lexical properties: Handling the orthographic complexity of the Japanese writing system , 2017 .

[7]  Gloria Virginia,et al.  Automatic Ontology Constructor for Indonesian Language , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[8]  Haklae Kim,et al.  Building a K-Pop knowledge graph using an entertainment ontology , 2017 .

[9]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[10]  Marko Bajec,et al.  SkipCor: Skip-Mention Coreference Resolution Using Linear-Chain Conditional Random Fields , 2014, PloS one.

[11]  Rahmad Mahendra,et al.  Named entity recognition on Indonesian Twitter posts using long short-term memory networks , 2017, 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[12]  Ito Wasito,et al.  Automatic Ontology Construction Using Text Corpora and Ontology Design Patterns (ODPs) in Alzheimer’s Disease , 2017 .

[13]  Tome Eftimov,et al.  A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations , 2017, PloS one.

[14]  Derwin Suhartono,et al.  Named-Entity Recognition for Indonesian Language using Bidirectional LSTM-CNNs , 2018 .

[15]  John Atkinson,et al.  A multi-strategy approach to biological named entity recognition , 2012, Expert Syst. Appl..

[16]  Saeed Al-Bukhitan,et al.  Arabic ontology learning using deep learning , 2017, WI.

[17]  Jing Qiu,et al.  A hybrid-based method for Chinese domain lightweight ontology construction , 2018, Int. J. Mach. Learn. Cybern..

[18]  Pranjal Jain,et al.  Ontology based Chatbot (For E-commerce Website) , 2018 .

[19]  Yu Zhang,et al.  Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning , 2018, bioRxiv.

[20]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition in Bengali: A Conditional Random Field Approach , 2008, IJCNLP.

[21]  Wiwin Suwarningsih,et al.  ImNER Indonesian medical named entity recognition , 2014, 2014 2nd International Conference on Technology, Informatics, Management, Engineering & Environment.

[22]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[23]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[24]  Deni Cahya Wintaka,et al.  Named-Entity Recognition on Indonesian Tweets using Bidirectional LSTM-CRF , 2019 .

[25]  Cheng Yang,et al.  An attention-based multi-task model for named entity recognition and intent analysis of Chinese online medical questions , 2020, J. Biomed. Informatics.

[26]  Grace Hui Yang,et al.  A Metric-based Framework for Automatic Taxonomy Induction , 2009, ACL.

[27]  Ayu Purwarianti,et al.  Indonesian Named-entity Recognition for 15 Classes Using Ensemble Supervised Learning , 2016, SLTU.

[28]  Michal Konkol,et al.  Latent semantics in Named Entity Recognition , 2015, Expert Syst. Appl..

[29]  Kiran Adnan,et al.  An analytical study of information extraction from unstructured and multidimensional big data , 2019, Journal of Big Data.

[30]  Guojian Xian,et al.  Construction and Application of Upper Country Ontology Based on OWL and SKOS , 2018, CSAE '18.

[31]  Henda Hajjami Ben Ghézala,et al.  Comparative study of word embedding methods in topic segmentation , 2017, KES.

[32]  Mohamad Aljnidi,et al.  Big data analysis and distributed deep learning for next-generation intrusion detection system optimization , 2019, Journal of Big Data.

[33]  Deepti Chopra,et al.  Named Entity Recognition using Hidden Markov Model (HMM) , 2012 .

[34]  Adnan Yazici,et al.  A hybrid named entity recognizer for Turkish , 2012, Expert Syst. Appl..

[35]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[36]  Veronika Vincze,et al.  Noun Compound and Named Entity Recognition and their Usability in Keyphrase Extraction , 2011, RANLP.

[37]  Mirna Adriani,et al.  Named entity recognition on Indonesian microblog messages , 2016, 2016 International Conference on Asian Language Processing (IALP).

[38]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[39]  Mauridhi Hery Purnomo,et al.  Hybrid Conditional Random Fields and K-Means for Named Entity Recognition on Indonesian News Documents , 2020 .

[40]  Bogdan Babych,et al.  Improving Machine Translation Quality with Automatic Named Entity Recognition , 2003, Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools Resources and Tools for Building MT - EAMT '03.

[41]  Asif Ekbal,et al.  A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies , 2011, Expert Syst. Appl..

[42]  Kuldeep Kumar,et al.  Generating Domain Ontology from Chinese Customer Reviews to Analysis Fine-gained Product Quality Risk , 2018 .