Knowledge Distillation Techniques for Biomedical Named Entity Recognition

The limited amount of annotated biomedical literature and its peculiar characteristics make biomedical named entity recognition more challenging than standard named entity recognition. The multitask learning approach overcomes these limitations by training different related tasks simultaneously. It learns common features among different tasks by sharing some layers of the neural network architecture. For this reason, the multi-task model attains more generalization properties than a single task learning. The generalization of the multi-task model can be utilized to enhance other models’ results. In particular, knowledge distillation techniques make this possible in which one model supervises, through its learned generalization, another model during the training. This research analyzes the knowledge distillation approach and shows that a simple deep learning model performance can be leveraged through distilling the multi-task model’s generalization. Results show that our approach outperformed compared with the multi-task model and single task model. This demonstrates that our model learns more diverse features using the knowledge distillation approach. We also found our approach statistically better than multi-task model and single task model.

[1]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[2]  Ivan Serina,et al.  Applying Self-interaction Attention for Extracting Drug-Drug Interactions , 2019, AI*IA.

[3]  Ivan Serina,et al.  The Impact of Self-Interaction Attention on the Extraction of Drug-Drug Interactions , 2019, CLiC-it.

[4]  Gary D. Bader,et al.  Transfer learning for biomedical named entity recognition with neural networks , 2018, bioRxiv.

[5]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[6]  Junmo Kim,et al.  Less-forgetting Learning in Deep Neural Networks , 2016, ArXiv.

[7]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[8]  Alberto Lavelli,et al.  Disease Mention Recognition with Specific Features , 2010, BioNLP@ACL.

[9]  Kuk-Jin Yoon,et al.  Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Firoj Alam,et al.  A knowledge-poor approach to chemical-disease relation extraction , 2016, Database J. Biol. Databases Curation.

[11]  Mourad Gridach,et al.  Character-level neural network for biomedical named entity recognition , 2017, J. Biomed. Informatics.

[12]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[13]  Yu Zhang,et al.  Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning , 2018, bioRxiv.

[14]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[16]  Xiaodong Liu,et al.  Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.

[17]  Joachim Bingel,et al.  Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[18]  Matteo Zubani,et al.  Evaluating different Natural Language Understanding services in a real business case for the Italian language , 2020, KES.

[19]  Philippe Langlais,et al.  SC-LSTM: Learning Task-Specific Representations in Multi-Task Learning for Sequence Labeling , 2019, NAACL.

[20]  Ivan Serina,et al.  Combining Multi-task Learning with Transfer Learning for Biomedical Named Entity Recognition , 2020, KES.

[21]  Jimmy J. Lin,et al.  Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.

[22]  Quoc V. Le,et al.  BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.

[23]  Kewei Tu,et al.  Structure-Level Knowledge Distillation For Multilingual Sequence Labeling , 2020, ACL.

[24]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[25]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[26]  Ivan Serina,et al.  Leveraging Multi-task Learning for Biomedical Named Entity Recognition , 2019, AI*IA.

[27]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[28]  Wei Xu,et al.  Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.

[29]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[30]  Ivan Serina,et al.  Multi-task Learning Applied to Biomedical Named Entity Recognition Task , 2019, CLiC-it.

[31]  Lorraine K. Tanabe,et al.  GENETAG: a tagged corpus for gene/protein named entity recognition , 2005, BMC Bioinformatics.

[32]  Di He,et al.  Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.

[33]  Andrew McCallum,et al.  Ask the GRU: Multi-task Learning for Deep Text Recommendations , 2016, RecSys.