An Analysis of a BERT Deep Learning Strategy on a Technology Assisted Review Task

Document screening is a central task within EBM (Evidence-based Medicine), which is a clinical discipline that supplements scientific proof to back medical decisions. Given the recent advances in DL (Deep Learning) methods applied to IR (Information Retrieval) tasks, I propose a DL document classification approach with BERT (Bidirectional Encoder Representations from Transformers) or PubMedBERT embeddings and a DL similarity search path using SBERT (Sentence-BERT) embeddings to reduce physicians’ tasks of screening and classifying immense amounts of documents to answer clinical queries. I test and evaluate the retrieval effectiveness of my DL strategy on the 2017 and 2018 CLEF eHealth collections. I find that the proposed DL strategy works, I compare it to the recently successful BM25+RM3 (IR) model, and conclude that the suggested method accomplishes advanced retrieval performance in the initial ranking of the articles with the aforementioned datasets, for the CLEF eHealth Technologically Assisted Reviews in Empirical Medicine Task.

[1]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[3]  Aurélie Névéol,et al.  LIMSI@CLEF eHealth 2017 Task 2: Logistic Regression for Automatic Article Ranking , 2017, CLEF.

[4]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[5]  Leif Azzopardi,et al.  CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview , 2018, CLEF.

[6]  Jimmy J. Lin,et al.  Anserini: Enabling the Use of Lucene for Information Retrieval Research , 2017, SIGIR.

[7]  Michael Baum,et al.  Ask the expert , 2000, Pediatric Nephrology.

[8]  Qinmin Hu,et al.  ECNU at 2018 eHealth Task 2: Technologically Assisted Reviews in Empirical Medicine , 2018, CLEF.

[9]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[10]  Aixin Sun,et al.  Seed-driven Document Ranking for Systematic Reviews in Evidence-Based Medicine , 2018, SIGIR.

[11]  Luis Sanchez,et al.  Easing Legal News Monitoring with Learning to Rank and BERT , 2020, ECIR.

[12]  W. Marsden I and J , 2012 .

[13]  Alexander Tsertsvadze,et al.  How to conduct systematic reviews more expeditiously? , 2015, Systematic Reviews.

[14]  M. Clarke,et al.  Improving the uptake of systematic reviews: a systematic review of intervention effectiveness and relevance , 2014, BMJ Open.

[15]  Hans Lobel,et al.  Automatic document screening of medical literature using word and text embeddings in an active learning setting , 2020, Scientometrics.

[16]  Denis Parra,et al.  Comparing Word Embeddings for Document Screening based on Active Learning , 2019, BIRNDL@SIGIR.

[17]  Iryna Gurevych,et al.  Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks , 2021, NAACL.

[18]  Sheldon Cohen Perceived stress in a probability sample of the United States , 1988 .

[19]  Marcel Worring,et al.  BERT for Evidence Retrieval and Claim Verification , 2019, ECIR.

[20]  S. Fowler,et al.  Evidence-based medicine and systematic review services at Becker Medical Library. , 2014, Missouri medicine.

[21]  Giorgio Maria Di Nunzio,et al.  An Interactive Two-Dimensional Approach to Query Aspects Rewriting in Systematic Reviews. IMS Unipd At CLEF eHealth Task 2 , 2017, CLEF.

[22]  Mark Stevenson,et al.  Ranking Abstracts to Identify Relevant Evidence for Systematic Reviews: The University of Sheffield's Approach to CLEF eHealth 2017 Task 2 , 2017, CLEF.

[23]  Aditi Sharan,et al.  THESAURUS AND QUERY EXPANSION , 2009 .

[24]  Jimmy J. Lin,et al.  Applying BERT to Document Retrieval with Birch , 2019, EMNLP.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Srikanta J. Bedathur,et al.  Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance , 2018, CIKM.

[27]  Jaspreet Singh,et al.  IIIT-H at CLEF eHealth 2017 Task 2: Technologically Assisted Reviews in Empirical Medicine , 2017, CLEF.

[28]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[29]  Xiaodong Liu,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[30]  Claudia Hauff,et al.  Diagnosing BERT with Retrieval Heuristics , 2020, ECIR.

[31]  Jian-Yun Nie,et al.  VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification , 2020, ECIR.

[32]  Grace Eunkyung Lee,et al.  A Study of Convolutional Neural Networks for Clinical Document Classification in Systematic Reviews: SysReview at CLEF eHealth 2017 , 2017, CLEF.

[33]  Maura R. Grossman,et al.  Technology-Assisted Review in Empirical Medicine: Waterloo Participation in CLEF eHealth 2017 , 2017, CLEF.