An integrated pipeline model for biomedical entity alignment

Biomedical entity alignment, composed of two sub-tasks: entity identification and entity-concept mapping, is of great research value in biomedical text mining while these techniques are widely used for name entity standardization, information retrieval, knowledge acquisition and ontology construction. Previous works made many efforts on feature engineering to employ feature-based models for entity identification and alignment. However, the models depended on subjective feature selection may suffer error propagation and are not able to utilize the hidden information. With rapid development in health-related research, researchers need an effective method to explore the large amount of available biomedical literatures. Therefore, we propose a two-stage entity alignment process, biomedical entity exploring model, to identify biomedical entities and align them to the knowledge base interactively. The model aims to automatically obtain semantic information for extracting biomedical entities and mining semantic relations through the standard biomedical knowledge base. The experiments show that the proposed method achieves better performance on entity alignment. The proposed model dramatically improves the F1 scores of the task by about 4.5% in entity identification and 2.5% in entity-concept mapping.

[1]  Dietrich Rebholz-Schuhmann,et al.  The BioLexicon: a large-scale terminological resource for biomedical text mining , 2011, BMC Bioinformatics.

[2]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[3]  Elisabeth Larsson,et al.  Data-Intensive Modelling and Simulation in Life Sciences and Socio-economical and Physical Sciences , 2017, Data Science and Engineering.

[4]  Pabitra Mitra,et al.  Feature selection techniques for maximum entropy based biomedical named entity recognition , 2009, J. Biomed. Informatics.

[5]  Rafael Valencia-García,et al.  Ontology learning from biomedical natural language documents using UMLS , 2011, Expert Syst. Appl..

[6]  Martijn J. Schuemie,et al.  Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification , 2007, J. Biomed. Informatics.

[7]  Noémie Elhadad,et al.  Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts , 2013, J. Biomed. Informatics.

[8]  Dongmei Li,et al.  Bon-EV: an improved multiple testing procedure for controlling false discovery rates , 2017, BMC Bioinformatics.

[9]  Ilyas Cicekli,et al.  Two learning approaches for protein name extraction , 2009, J. Biomed. Informatics.

[10]  Maryam Habibi,et al.  Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[11]  Donghong Ji,et al.  Long short-term memory RNN for biomedical named entity recognition , 2017, BMC Bioinformatics.

[12]  Xiaolin Li,et al.  GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text , 2017, Bioinform..

[13]  Dieter Galea,et al.  Exploiting and assessing multi-source data for supervised biomedical named entity recognition , 2018, Bioinform..

[14]  Alex Sánchez-Pla,et al.  Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data , 2018, BMC Bioinformatics.

[15]  Min Song,et al.  Developing a hybrid dictionary-based bio-entity recognition technique , 2015, BMC Medical Informatics and Decision Making.

[16]  Javed Mostafa,et al.  A hybrid approach to protein name identification in biomedical texts , 2005, Inf. Process. Manag..

[17]  Maguelonne Teisseire,et al.  A novel framework for biomedical entity sense induction , 2018, J. Biomed. Informatics.

[18]  Ming Yang,et al.  Chemical-induced disease extraction via recurrent piecewise convolutional neural networks , 2018, BMC Medical Informatics and Decision Making.

[19]  Hongfei Lin,et al.  Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature , 2008, Comput. Biol. Chem..

[20]  Thanh Hai Dang,et al.  D3NER: biomedical named entity recognition using CRF‐biLSTM improved with fine‐tuned embeddings of various linguistic information , 2018, Bioinform..

[21]  Marco Baroni,et al.  Tabula Nearly Rasa: Probing the Linguistic Knowledge of Character-level Neural Language Models Trained on Unsegmented Text , 2019, Transactions of the Association for Computational Linguistics.