论文信息 - Dataset-aware multi-task learning approaches for biomedical named entity recognition

Dataset-aware multi-task learning approaches for biomedical named entity recognition

MOTIVATION Named entity recognition (NER) is a critical and fundamental task for biomedical text-mining. Recently, researchers have focused on exploiting deep neural networks for biomedical named entity recognition (Bio-NER). The performance of deep neural networks on a single dataset mostly depends on data quality and quantity while high quality data tends to be limited in size. To alleviate task-specific data limitation, some studies explored the multi-task learning for Bio-NER and achieved state-of-the-art performance. However, these multi-task learning methods did not make full use of information from various datasets of Bio-NER. The performance of state-of-the-art multi-task learning method was significantly limited by the number of training datasets. RESULTS We propose two dataset-aware multi-task learning (MTL) approaches for Bio-NER which jointly train all models for numerous Bio-NER datasets, thus each of these models could discriminatively exploit information from all of related training datasets. Both of our two approaches achieve substantially better performance compared with the state-of-the-art multi-task learning method on 14 out of 15 Bio-NER datasets. Furthermore, we implemented our approaches by incorporating Bio-NER and biomedical POS (part-of-speech) tagging datasets. The results verify Bio-NER and POS can significantly enhance one another. AVAILABILITY Our source code is available at https://github.com/zmmzGitHub/MTL-BC-LBC-BioNER and all datasets are publicly available at https://github.com/cambridgeltl/MTL-Bioinformatics-2016. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Yang Zhang | Mei Zuo

[1] Yu Zhang,et al. Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning , 2018, bioRxiv.

[2] Sampo Pyysalo,et al. A neural network multi-task learning approach to biomedical named entity recognition , 2017, BMC Bioinformatics.

[3] Xiaohui Liang,et al. CHEMDNER system with mixed conditional random fields and multi-scale word clustering , 2015, Journal of Cheminformatics.

[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[5] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.

[6] Zhiyong Lu,et al. TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[7] Gary D. Bader,et al. Transfer learning for biomedical named entity recognition with neural networks , 2018, bioRxiv.

[8] Thanh Hai Dang,et al. D3NER: biomedical named entity recognition using CRF‐biLSTM improved with fine‐tuned embeddings of various linguistic information , 2018, Bioinform..

[9] Maryam Habibi,et al. Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[10] Xiaolin Li,et al. GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text , 2017, Bioinform..

[11] Jian Su,et al. Recognizing Names in Biomedical Texts: a Machine Learning Approach , 2004 .

[12] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[13] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..