GraphPrompt: Biomedical Entity Normalization Using Graph-based Prompt Templates

Biomedical entity normalization unifies the language across biomedical experiments and studies, and further enables us to obtain a holistic view of life sciences. Current approaches mainly study the normalization of more standardized entities such as diseases and drugs, while disregarding the more ambiguous but crucial entities such as pathways, functions and cell types, hindering their real-world applications. To achieve biomedical entity normalization on these under-explored entities, we first introduce an expert-curated dataset OBO-syn encompassing 70 different types of entities and 2 million curated entity-synonym pairs. To utilize the unique graph structure in this dataset, we propose GraphPrompt, a promptbased learning approach that creates prompt templates according to the graphs. Graph-Prompt obtained 41.0% and 29.9% improvement on zero-shot and few-shot settings respectively, indicating the effectiveness of these graph-based prompt templates. We envision that our method GraphPrompt and OBO-syn dataset can be broadly applied to graph-based NLP tasks, and serve as the basis for analyzing diverse and accumulating biomedical data.

[1]  Saeed-Ul Hassan,et al.  Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks , 2020, Soft Comput..

[2]  Pan Deng,et al.  An ensemble CNN method for biomedical entity normalization , 2019, EMNLP.

[3]  Graciela Gonzalez,et al.  The DIEGO Lab Graph Based Gene Normalization System , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[4]  Hua Xu,et al.  BERT-based Ranking for Biomedical Entity Normalization , 2019, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[5]  Danielle L. Mowery,et al.  Task 1: ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[6]  D. Pujary,et al.  Disease Normalization with Graph Embeddings , 2020, IntelliSys.

[7]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[8]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[9]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[10]  Xuanjing Huang,et al.  Enhancing Scientific Papers Summarization with Citation Graph , 2021, AAAI.

[11]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[12]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[13]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[14]  Takanori Maehara,et al.  Revisiting Graph Neural Networks: All We Have is Low-Pass Filters , 2019, ArXiv.

[15]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[16]  Sudeshna Sarkar,et al.  Medical Entity Linking using Triplet Network , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[17]  Jaewoo Kang,et al.  Biomedical Entity Representations with Synonym Marginalization , 2020, ACL.

[18]  Xiaodong Liu,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[19]  Zhiyong Lu,et al.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[20]  Rabeeh Ayaz Abbasi,et al.  Using graph embedding and machine learning to identify rebels on twitter , 2021, J. Informetrics.

[21]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[22]  Graham Neubig,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[23]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[24]  Steven J. M. Jones,et al.  CancerMine: A literature-mined resource for drivers, oncogenes and tumor suppressors in cancer , 2018 .

[25]  Kirk Roberts,et al.  Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track , 2017, TAC.

[26]  Vincent Ng,et al.  Sieve-Based Entity Linking for the Biomedical Domain , 2015, ACL.

[27]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[28]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.

[29]  Elena Tutubalina,et al.  Medical concept normalization in clinical trials with drug and disease representation learning , 2021, Bioinform..

[30]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[31]  Zhiyuan Liu,et al.  PTR: Prompt Tuning with Rules for Text Classification , 2021, AI Open.

[32]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[33]  Yi Luo,et al.  Multi-Task Medical Concept Normalization Using Multi-View Convolutional Neural Network , 2018, AAAI.

[34]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[35]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[36]  Alexander M. Rush,et al.  Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.

[37]  T. Ideker,et al.  Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems , 2016, Cell systems.

[38]  Xiaolong Wang,et al.  CNN-based ranking for biomedical entity normalization , 2017, BMC Bioinformatics.

[39]  Zhiyong Lu,et al.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[40]  Fei Huang,et al.  AdaPrompt: Adaptive Prompt-based Finetuning for Relation Extraction , 2021 .

[41]  Dustin Wright,et al.  NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction , 2019, AKBC.

[42]  Marianna Apidianaki,et al.  Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors , 2019, BioNLP@ACL.

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[44]  Fei Wang,et al.  Recent advances in biomedical literature mining , 2020, Briefings Bioinform..

[45]  Wei-Yun Ma,et al.  GraphRel: Modeling Text as Relational Graphs for Joint Entity and Relation Extraction , 2019, ACL.

[46]  Yi Tay,et al.  Robust Representation Learning of Biomedical Names , 2019, ACL.

[47]  Timo Schick,et al.  Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.