Automatic classification of registered clinical trials towards the Global Burden of Diseases taxonomy of diseases and injuries

BackgroundClinical trial registries may allow for producing a global mapping of health research. However, health conditions are not described with standardized taxonomies in registries. Previous work analyzed clinical trial registries to improve the retrieval of relevant clinical trials for patients. However, no previous work has classified clinical trials across diseases using a standardized taxonomy allowing a comparison between global health research and global burden across diseases. We developed a knowledge-based classifier of health conditions studied in registered clinical trials towards categories of diseases and injuries from the Global Burden of Diseases (GBD) 2010 study.The classifier relies on the UMLS® knowledge source (Unified Medical Language System®) and on heuristic algorithms for parsing data. It maps trial records to a 28-class grouping of the GBD categories by automatically extracting UMLS concepts from text fields and by projecting concepts between medical terminologies. The classifier allows deriving pathways between the clinical trial record and candidate GBD categories using natural language processing and links between knowledge sources, and selects the relevant GBD classification based on rules of prioritization across the pathways found. We compared automatic and manual classifications for an external test set of 2,763 trials. We automatically classified 109,603 interventional trials registered before February 2014 at WHO ICTRP.ResultsIn the external test set, the classifier identified the exact GBD categories for 78 % of the trials. It had very good performance for most of the 28 categories, especially “Neoplasms” (sensitivity 97.4 %, specificity 97.5 %). The sensitivity was moderate for trials not relevant to any GBD category (53 %) and low for trials of injuries (16 %). For the 109,603 trials registered at WHO ICTRP, the classifier did not assign any GBD category to 20.5 % of trials while the most common GBD categories were “Neoplasms” (22.8 %) and “Diabetes” (8.9 %).ConclusionsWe developed and validated a knowledge-based classifier allowing for automatically identifying the diseases studied in registered trials by using the taxonomy from the GBD 2010 study. This tool is freely available to the research community and can be used for large-scale public health studies.

[1]  Kenneth D. Mandl,et al.  Association Between Pediatric Clinical Trials and Global Burden of Disease , 2014, Pediatrics.

[2]  Olivier Bodenreider,et al.  Utilizing the UMLS for Semantic Mapping between Terminologies , 2005, AMIA.

[3]  Alexa T. McCray,et al.  Understanding Search Failures in Consumer Health Information Systems , 2003, AMIA.

[4]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[5]  John P A Ioannidis,et al.  Relation between burden of disease and randomised evidence in sub-Saharan Africa: survey of research , 2002, BMJ : British Medical Journal.

[6]  P Zweigenbaum,et al.  Clinical Natural Language Processing in 2014: Foundational Methods Supporting Efficient Healthcare , 2015, Yearbook of Medical Informatics.

[7]  Chunhua Weng,et al.  Formal representation of eligibility criteria: A literature review , 2010, J. Biomed. Informatics.

[8]  John P. A. Ioannidis,et al.  Attention to Local Health Burden and the Global Disparity of Health Research , 2014, PloS one.

[9]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[10]  John P A Ioannidis,et al.  Number of published systematic reviews and global burden of disease: database analysis , 2003, BMJ : British Medical Journal.

[11]  Mohsen Naghavi,et al.  GBD 2010: design, definitions, and metrics , 2012, The Lancet.

[12]  Roderik F Viergever,et al.  Trends in global clinical trial registration: an analysis of numbers of registered clinical trials in different parts of the world from 2004 to 2013 , 2015, BMJ Open.

[13]  Amrapali Zaveri,et al.  Global burden of skin disease as reflected in Cochrane Database of Systematic Reviews. , 2014, JAMA dermatology.

[14]  Christopher Dye,et al.  Creating a global observatory for health R&D , 2014, Science.

[15]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[16]  Riccardo Miotto,et al.  A human-computer collaborative approach to identifying common data elements in clinical trial eligibility criteria , 2013, J. Biomed. Informatics.

[17]  Antonio Jimeno-Yepes,et al.  The NLM Medical Text Indexer System for Indexing Biomedical Literature , 2013, BioASQ@CLEF.

[18]  Frank van Harmelen,et al.  SemanticCT: A Semantically-Enabled System for Clinical Trials , 2013, KR4HC/ProHealth.

[19]  Ricardo Pietrobon,et al.  The Database for Aggregate Analysis of ClinicalTrials.gov (AACT) and Subsequent Regrouping by Clinical Specialty , 2012, PloS one.

[20]  J. Fisher,et al.  Tracking the Pharmaceutical Pipeline: Clinical Trials and Global Disease Burden , 2014, Clinical and translational science.

[21]  Xiaoying Wu,et al.  EliXR: an approach to eligibility criteria extraction and representation , 2011, J. Am. Medical Informatics Assoc..

[22]  Isabelle Boutron,et al.  Geographical Representativeness of Published and Ongoing Randomized Controlled Trials. The Example of: Tobacco Consumption and HIV Infection , 2011, PloS one.

[23]  Roderik F. Viergever,et al.  The Quality of Registration of Clinical Trials: Still a Problem , 2014, PloS one.

[24]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[25]  Tianyong Hao,et al.  Clustering clinical trials with similar eligibility criteria features , 2014, J. Biomed. Informatics.

[26]  Sally Hopewell,et al.  Association between randomised trial evidence and global burden of disease: cross sectional study (Epidemiological Study of Randomized Trials—ESORT) , 2015, BMJ : British Medical Journal.

[27]  Tingting Mu,et al.  ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials , 2012, BMC Medical Informatics and Decision Making.

[28]  Philippe Ravaud,et al.  Differential Globalization of Industry- and Non-Industry–Sponsored Clinical Trials , 2015, PloS one.

[29]  Riccardo Miotto,et al.  eTACTS: A method for dynamically filtering clinical trial search results , 2013, J. Biomed. Informatics.

[30]  Bernadette A. Thomas,et al.  Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010 , 2012, The Lancet.

[31]  Richard Lenz Silvia Miksch,et al.  Process Support and Knowledge Representation in Health Care , 2013, Lecture Notes in Computer Science.

[32]  M. Boland,et al.  Feasibility of Feature-based Indexing, Clustering, and Search of Clinical Trials , 2013, Methods of Information in Medicine.

[33]  Frank van Harmelen,et al.  Building a Library of Eligibility Criteria to Support Design of Clinical Trials , 2012, EKAW.

[34]  Robert F Terry,et al.  Mapping of available health research and development data: what's there, what's missing, and what role is there for a global observatory? , 2013, The Lancet.

[35]  Pablo Perel,et al.  Relation between the Global Burden of Disease and Randomized Clinical Trials Conducted in Latin America Published in the Five Leading Medical Journals , 2008, PloS one.

[36]  John-Arne Røttingen,et al.  Informing the establishment of the WHO Global Observatory on Health Research and Development: a call for papers , 2015, Health Research Policy and Systems.

[37]  Tianyong Hao,et al.  A Method for Analyzing Commonalities in Clinical Trial Target Populations , 2014, AMIA.

[38]  Dina Demner-Fushman,et al.  Application of Information Technology: Essie: A Concept-based Search Engine for Structured Biomedical Text , 2007, J. Am. Medical Informatics Assoc..

[39]  Anita Burgun-Parenthoine,et al.  Using Semantic Web Technologies for Clinical Trial Recruitment , 2010, International Semantic Web Conference.

[40]  Olivier Bodenreider,et al.  From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches , 2007, BioNLP@ACL.

[41]  David Martínez,et al.  Evaluating the state of the art in disorder recognition and normalization of the clinical narrative , 2014, J. Am. Medical Informatics Assoc..

[42]  Peter Szolovits,et al.  Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text , 2015, J. Am. Medical Informatics Assoc..

[43]  Riccardo Miotto,et al.  A Method for Probing Disease Relatedness Using Common Clinical Eligibility Criteria , 2013, MedInfo.

[44]  Robert F Terry,et al.  Use of data from registered clinical trials to identify gaps in health research and development. , 2013, Bulletin of the World Health Organization.

[45]  Chunhua Weng,et al.  Visual aggregate analysis of eligibility features of clinical trials , 2015, J. Biomed. Informatics.

[46]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..