A Neural Text Ranking Approach for Automatic MeSH Indexing

The U.S. National Library of Medicine (NLM) has been indexing the biomedical literature with MeSH terms since the mid-1960s, and in recent years the library has increasingly relied on AI assistance and automation to curate the biomedical literature more efficiently. Since 2002, the NLM has been using natural language processing algorithms to assist indexers by providing MeSH term recommendations, and we are continually working to improve the quality of these recommendations. This work presents a new neural text ranking approach for automatic MeSH indexing. The domain-specific pretrained transformer model, PubMedBERT, was fine-tuned on MEDLINE data and used to rank candidate main headings obtained from a Convolutional Neural Network (CNN). Pointwise, listwise, and multi-stage ranking approaches are demonstrated, and the algorithm performance was evaluated by participating in the BioASQ challenge task 9a on semantic indexing. The neural text ranking approach was found to have very competitive performance in the final batch of the challenge, and the multi-stage ranking method typically boosted the CNN model performance by about 5% points in terms of micro F1-score.

[1]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[2]  Aidong Zhang,et al.  MeSHProbeNet: a self-attentive probe net for MeSH indexing , 2019, Bioinform..

[3]  Bhaskar Mitra,et al.  Overview of the TREC 2019 deep learning track , 2020, ArXiv.

[4]  P. Alam ‘K’ , 2021, Composites Engineering.

[5]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[6]  Dina Demner-Fushman,et al.  Using Learning-To-Rank to Enhance NLM Medical Text Indexer Results , 2016, Proceedings of the Fourth BioASQ workshop.

[7]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[8]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[9]  William W. Cohen,et al.  AttentionMeSH: Simple, Effective and Interpretable Automatic MeSH Indexer , 2018 .

[10]  Jimmy J. Lin,et al.  Pretrained Transformers for Text Ranking: BERT and Beyond , 2021, SIGIR.

[11]  Yuxuan Liu,et al.  BERTMeSH: Deep Contextual Representation Learning for Large-scale High-performance MeSH Indexing with Full Text , 2020, bioRxiv.

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[14]  Dina Demner-Fushman,et al.  12 years on – Is the NLM medical text indexer still useful and relevant? , 2017, Journal of Biomedical Semantics.

[15]  P. Alam,et al.  H , 1887, High Explosives, Propellants, Pyrotechnics.

[16]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[17]  James G. Mork,et al.  Automatic MeSH Indexing: Revisiting the Subheading Attachment Problem , 2020, AMIA.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[20]  Martin Krallinger,et al.  BioASQ at CLEF2020: Large-Scale Biomedical Semantic Indexing and Question Answering , 2020, ECIR.

[21]  ChengXiang Zhai,et al.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing , 2016, Bioinform..