论文信息 - NCUEE-NLP at SemEval-2023 Task 7: Ensemble Biomedical LinkBERT Transformers in Multi-evidence Natural Language Inference for Clinical Trial Data

NCUEE-NLP at SemEval-2023 Task 7: Ensemble Biomedical LinkBERT Transformers in Multi-evidence Natural Language Inference for Clinical Trial Data

This study describes the model design of the NCUEE-NLP system for the SemEval-2023 NLI4CT task that focuses on multi-evidence natural language inference for clinical trial data. We use the LinkBERT transformer in the biomedical domain (denoted as BioLinkBERT) as our main system architecture. First, a set of sentences in clinical trial reports is extracted as evidence for premise-statement inference. This identified evidence is then used to determine the inference relation (i.e., entailment or contradiction). Finally, a soft voting ensemble mechanism is applied to enhance the system performance. For Subtask 1 on textual entailment, our best submission had an F1-score of 0.7091, ranking sixth among all 30 participating teams. For Subtask 2 on evidence retrieval, our best result obtained an F1-score of 0.7940, ranking ninth of 19 submissions.

Lung-Hao Lee | Chao-Yi Chen | Kao-Yuan Tien | Yuan-Hao Cheng

[1] H. Frost,et al. SemEval-2023 Task 7: Multi-Evidence Natural Language Inference for Clinical Trial Data , 2023, SEMEVAL.

[2] J. Leskovec,et al. LinkBERT: Pretraining Language Models with Document Links , 2022, ACL.

[3] Niloy Ganguly,et al. Incorporating Domain Knowledge into Medical NLI using Knowledge Graphs , 2019, EMNLP.

[4] Lung-Hao Lee,et al. NCUEE at MEDIQA 2019: Medical Text Inference Using Ensemble BERT-BiLSTM-Attention Model , 2019, BioNLP@ACL.

[5] Asma Ben Abacha,et al. Overview of the MEDIQA 2019 Shared Task on Textual Inference, Question Entailment and Question Answering , 2019, BioNLP@ACL.

[6] Wei-Hung Weng,et al. Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[7] J. Chai,et al. Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches , 2019, 1904.01172.

[8] Iz Beltagy,et al. SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[9] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[10] Alexey Romanov,et al. Lessons from Natural Language Inference in the Clinical Domain , 2018, EMNLP.

[11] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[12] Zhen-Hua Ling,et al. Enhanced LSTM for Natural Language Inference , 2016, ACL.

[13] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[14] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[15] Walter Daelemans,et al. Are we there yet? Exploring clinical domain knowledge of BERT models , 2021, BIONLP.

[16] Philip S. Yu,et al. Improving Medical NLI Using Context-Aware Domain Knowledge , 2020, STARSEM.

[17] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.