VIST - a Variant-Information Search Tool for precision oncology

BackgroundDiagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient’s genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact. We develop the Variant-Information Search Tool (VIST), a search engine designed for the targeted search of clinically relevant publications given an oncological mutation profile.ResultsVIST indexes all PubMed abstracts and content from ClinicalTrials.gov. It applies advanced text mining to identify mentions of genes, variants and drugs and uses machine learning based scoring to judge the clinical relevance of indexed abstracts. Its functionality is available through a fast and intuitive web interface. We perform several evaluations, showing that VIST’s ranking is superior to that of PubMed or a pure vector space model with regard to the clinical relevance of a document’s content.ConclusionDifferent user groups search repositories of scientific publications with different intentions. This diversity is not adequately reflected in the standard search engines, often leading to poor performance in specialized settings. We develop a search engine for the specific case of finding documents that are clinically relevant in the course of cancer treatment. We believe that the architecture of our engine, heavily relying on machine learning algorithms, can also act as a blueprint for search engines in other, equally specific domains. VIST is freely available at https://vist.informatik.hu-berlin.de/

[1]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[2]  Hristo S. Paskov,et al.  Multitask learning improves prediction of cancer drug sensitivity , 2016, Scientific Reports.

[3]  Jason Li,et al.  PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories , 2017, Genome Medicine.

[4]  Zhiyong Lu,et al.  How user intelligence is improving PubMed , 2018, Nature Biotechnology.

[5]  Zhiyong Lu,et al.  GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains , 2015, BioMed research international.

[6]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[7]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[8]  Alex H. Wagner,et al.  DGIdb 3.0: a redesign and expansion of the drug–gene interaction database , 2017, bioRxiv.

[9]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[10]  Levi A Garraway,et al.  Precision oncology: an overview. , 2013, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[11]  Subha Madhavan,et al.  A harmonized meta-knowledgebase of clinical interpretations of cancer genomic variants , 2018, bioRxiv.

[12]  Zhiyong Lu,et al.  Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine , 2016, PLoS Comput. Biol..

[13]  Raquel Urtasun,et al.  Few-Shot Learning Through an Information Retrieval Lens , 2017, NIPS.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  J. Taube,et al.  Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy , 2016, Nature Reviews Cancer.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Gerhard Weikum,et al.  DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences , 2016, ACL.

[18]  Rainer Gemulla,et al.  A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval , 2018, BioNLP.

[19]  Moriah H Nissan,et al.  OncoKB: A Precision Oncology Knowledge Base. , 2017, JCO precision oncology.

[20]  Zhiyong Lu,et al.  tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[21]  Zhiyong Lu,et al.  Towards PubMed 2.0 , 2017, eLife.

[22]  Jeremy L Warner,et al.  Identifying Health Information Technology Needs of Oncologists to Facilitate the Adoption of Genomic Medicine: Recommendations From the 2016 American Society of Clinical Oncology Omics and Precision Oncology Workshop. , 2017, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[23]  Yifan Peng,et al.  BioSentVec: creating sentence embeddings for biomedical texts , 2018, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[24]  Tianwei Yu,et al.  K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data , 2015, BioMed research international.

[25]  Kirk Roberts,et al.  Assessing the Corpus Size vs. Similarity Trade-off for Word Embeddings in Clinical NLP , 2016, ClinicalNLP@COLING 2016.

[26]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[27]  Sampo Pyysalo,et al.  Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer , 2017, Bioinform..

[28]  Steven J. M. Jones,et al.  CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer , 2017, Nature Genetics.

[29]  Jong C. Park,et al.  DigSee: disease gene search engine with evidence sentences (version cancer) , 2013, Nucleic Acids Res..

[30]  Maryam Habibi,et al.  Recognizing chemicals in patents: a comparative analysis , 2016, Journal of Cheminformatics.

[31]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[32]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[33]  Taehoon Kim,et al.  RefMed: relevance feedback retrieval system fo PubMed , 2009, CIKM.

[34]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ulf Leser,et al.  GeneView: a comprehensive semantic search engine for PubMed , 2012, Nucleic Acids Res..

[36]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[37]  Ulf Leser,et al.  Variant information systems for precision oncology , 2018, BMC Medical Informatics and Decision Making.

[38]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[40]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[41]  Ellen M. Voorhees,et al.  Overview of the TREC 2020 Precision Medicine Track , 2017, TREC.

[42]  Matthias Lange,et al.  SEMEDA: ontology based semantic integration of biological databases , 2003, Bioinform..

[43]  Ulf Leser,et al.  Identifying Key Sentences for Precision Oncology Using Semi-Supervised Learning , 2018, BioNLP.

[44]  Joshua F. McMichael,et al.  DoCM: a database of curated mutations in cancer , 2016, Nature Methods.

[45]  Marcin Imielinski,et al.  The cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations , 2016, J. Am. Medical Informatics Assoc..