CCS Explorer: Relevance Prediction, Extractive Summarization, and Named Entity Recognition from Clinical Cohort Studies

Clinical Cohort Studies (CCS), such as randomized clinical trials, are a great source of documented clinical research. Ideally, a clinical expert inspects these articles for exploratory analysis ranging from drug discovery for evaluating the efficacy of existing drugs in tackling emerging diseases to the first test of newly developed drugs. However, more than 100 articles are published daily on a single prevalent disease like COVID-19 in PubMed. As a result, it can take days for a physician to find articles and extract relevant information. Can we develop a system to sift through these articles faster and document the crucial takeaways from each of these articles? In this work, we propose CCS Explorer, an end-to-end system for relevance prediction of sentences, extractive summarization, and patient, outcome, and intervention entity detection from CCS. CCS Explorer is packaged in a web-based graphical user interface where the user can provide any disease name. CCS Explorer then extracts and aggregates all relevant information from articles on PubMed based on the results of an automatically generated query produced on the back-end. For each task, CCS Explorer fine-tunes pre-trained language representation models based on transformers with additional layers. The models are evaluated using two publicly available datasets. CCS Explorer obtains a recall of 80.2%, AUC-ROC of 0.843, and an accuracy of 88.3% on sentence relevance prediction using BioBERT and achieves an average Micro F1-Score of 77.8% on Patient, Intervention, Outcome detection (PIO) using PubMedBERT. Thus, CCS Explorer can reliably extract relevant information to summarize articles, saving time by ~660×.

[1]  K. Fall,et al.  Beta-blocker use and urothelial bladder cancer survival: a Swedish register-based cohort study , 2022, Acta oncologica.

[2]  G. Ursin,et al.  β-blockers and breast cancer survival by molecular subtypes: a population-based cohort study and meta-analysis , 2022, British Journal of Cancer.

[3]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[4]  J. Trogdon,et al.  Providers’ mediating role for medication adherence among cancer survivors , 2021, PloS one.

[5]  J. Farrar,et al.  Impact of vaccination on new SARS-CoV-2 infections in the United Kingdom , 2021, Nature Medicine.

[6]  Lawrence L. He,et al.  Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19 , 2021, Pharmaceutics.

[7]  N. Chitnis,et al.  Impact of vaccination and non-pharmaceutical interventions on SARS-CoV-2 dynamics in Switzerland , 2021, Epidemics.

[8]  Cassie S. Mitchell,et al.  Meta-Analysis of Gastrointestinal Adverse Events from Tyrosine Kinase Inhibitors for Chronic Myeloid Leukemia , 2021, Cancers.

[9]  Zaiqiao Meng,et al.  Self-Alignment Pretraining for Biomedical Entity Representations , 2020, NAACL.

[10]  Hao Cheng,et al.  Knowledge-Rich Self-Supervised Entity Linking , 2021, ArXiv.

[11]  Malaikannan Sankarasubbu,et al.  BioELECTRA:Pretrained Biomedical text Encoder using Discriminators , 2021, BIONLP.

[12]  Akihiro Tamura,et al.  Supervised Visual Attention for Multimodal Neural Machine Translation , 2020, COLING.

[13]  Byron C. Wallace,et al.  Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations , 2020, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[14]  Jacob White PubMed 2.0 , 2020, Medical reference services quarterly.

[15]  Jingyu Wang,et al.  Adversarial and Domain-Aware BERT for Cross-Domain Sentiment Analysis , 2020, ACL.

[16]  Bor Luen Tang,et al.  An alarming retraction rate for scientific publications on Coronavirus Disease 2019 (COVID-19) , 2020, Accountability in research.

[17]  Iain J. Marshall,et al.  Evidence Inference 2.0: More Data, Better Models , 2020, BIONLP.

[18]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[19]  L. Abu-Raddad,et al.  Epidemiological Impact of SARS-CoV-2 Vaccination: Mathematical Modeling Analyses , 2020, medRxiv.

[20]  Pengfei Liu,et al.  Extractive Summarization as Text Matching , 2020, ACL.

[21]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[22]  Ruoyuan Gao,et al.  Toward creating a fairer ranking in search engine results , 2020, Inf. Process. Manag..

[23]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[24]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[25]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[26]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[27]  X. Shu,et al.  Use of Antihypertensive Medications and Survival of Breast, Colorectal, Lung, or Stomach Cancer. , 2019, American Journal of Epidemiology.

[28]  Masao Utiyama,et al.  Neural Machine Translation with Reordering Embeddings , 2019, ACL.

[29]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[30]  Regina Barzilay,et al.  Inferring Which Medical Treatments Work from Reports of Clinical Trials , 2019, NAACL.

[31]  A. Sterrett,et al.  Cardiovascular medication use and risks of colon cancer recurrences and additional cancer events: a cohort study , 2019, BMC Cancer.

[32]  Yang Liu,et al.  Fine-tune BERT for Extractive Summarization , 2019, ArXiv.

[33]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[34]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[35]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Rémi Louf,et al.  Transformers : State-ofthe-art Natural Language Processing , 2019 .

[38]  Alexander M. Rush,et al.  Bottom-Up Abstractive Summarization , 2018, EMNLP.

[39]  C. van Walraven,et al.  Association between perioperative beta blocker use and cancer survival following surgical resection. , 2018, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[40]  Junyi Jessy Li,et al.  A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature , 2018, ACL.

[41]  W. Niu,et al.  Impact of long-term antihypertensive and antidiabetic medications on the prognosis of post-surgical colorectal cancer: the Fujian prospective investigation of cancer (FIESTA) study , 2018, Aging.

[42]  Christopher W. Belter A relevance ranking method for citation-based search results , 2017, Scientometrics.

[43]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[44]  Cassie S. Mitchell,et al.  Undergraduate Biocuration: Developing Tomorrow's Researchers While Mining Today's Data. , 2015, Journal of undergraduate neuroscience education : JUNE : a publication of FUN, Faculty for Undergraduate Neuroscience.

[45]  Cassie S. Mitchell,et al.  Antecedent Disease Is Less Prevalent in Amyotrophic Lateral Sclerosis , 2015, Neurodegenerative Diseases.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Patrice Diot,et al.  Comorbidities of COPD , 2013, European Respiratory Review.

[48]  D. Powe,et al.  β-Blocker usage and colorectal cancer mortality: a nested case-control study in the UK Clinical Practice Research Datalink cohort. , 2013, Annals of oncology : official journal of the European Society for Medical Oncology.

[49]  J. Kiecolt-Glaser,et al.  Beta‐blockers may reduce intrusive thoughts in newly diagnosed cancer patients , 2013, Psycho-oncology.

[50]  Enrico Coiera,et al.  The automation of systematic reviews , 2013, BMJ.

[51]  M. Ashwell,et al.  Waist‐to‐height ratio is a better screening tool than waist circumference and BMI for adult cardiometabolic risk factors: systematic review and meta‐analysis , 2012, Obesity reviews : an official journal of the International Association for the Study of Obesity.

[52]  T. Brown,et al.  Cardiac Rehabilitation Outcomes: IMPACT OF COMORBIDITIES AND AGE , 2011, Journal of cardiopulmonary rehabilitation and prevention.

[53]  D. Cook,et al.  Does β-adrenoceptor blocker therapy improve cancer survival? Findings from a population-based retrospective cohort study. , 2011, British journal of clinical pharmacology.

[54]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[55]  N. Meier,et al.  Age-dependent differences in demographics, risk factors, co-morbidity, etiology, management, and clinical outcome of acute ischemic stroke , 2008, Journal of Neurology.

[56]  Michael A. Burke,et al.  Interpretation of B-type natriuretic peptide in cardiac disease and other comorbid conditions , 2007, Heart Failure Reviews.

[57]  C. Lang,et al.  Non-cardiac comorbidities in chronic heart failure , 2006, Heart.

[58]  Ani Nenkova,et al.  Summarization evaluation for text and speech: issues and approaches , 2006, INTERSPEECH.

[59]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[60]  Alon Lavie,et al.  The significance of recall in automatic metrics for MT evaluation , 2004, AMTA.

[61]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[62]  Jeffrey Chang,et al.  Biopython: Python tools for computational biology , 2000, SIGB.

[63]  Louise T. Su The Relevance of Recall and Precision in User Evaluation , 1994, J. Am. Soc. Inf. Sci..