Guiding Supervised Learning by Bio-Ontologies in Medical Data Analysis

Ontologies are popular way of representing knowledge and semantics of data in medical and health fields. Surprisingly, few machine learning methods allow for encoding semantics of data and even fewer allow for using ontologies to guide learning process. This paper discusses the use of data semantics and ontologies in health and medical applications of supervised learning, and particularly describes how the Unified Medical Language System (UMLS) is used within AQ21 rule learning software. Presented concepts are illustrated using two applications based on distinctly different types of data and methodological issues.

[1]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[2]  F. Montorsi,et al.  Impact of age and comorbidities on long-term survival of patients with high-risk prostate cancer treated with radical prostatectomy: a multi-institutional competing-risks analysis. , 2013, European urology.

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  Janusz Wojtusiak,et al.  Semantic Data Types in Machine Learning from Healthcare Data , 2012, 2012 11th International Conference on Machine Learning and Applications.

[6]  J. Coebergh,et al.  The impact of comorbidity on Health-Related Quality of Life among cancer survivors: analyses of data from the PROFILES registry , 2013, Journal of Cancer Survivorship.

[7]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[8]  Masahiko Ando,et al.  Activities of Daily Living and Quality of Life of Elderly Patients After Elective Surgery for Gastric and Colorectal Cancers , 2007, Annals of surgery.

[9]  Achim Rettinger,et al.  Towards Machine Learning on the Semantic Web , 2008, URSW.

[10]  Ryszard S. Michalski,et al.  Towards application of rule learning to the meta-analysis of clinical data: An example of the metabolic syndrome , 2009, Int. J. Medical Informatics.

[11]  Cui Tao,et al.  Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis , 2012, J. Am. Medical Informatics Assoc..

[12]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[13]  Ryszard S. Michalski,et al.  Hypothesis-Driven Constructive Induction in AQ17-HCI: A Method and Experiments , 1994, Machine Learning.

[14]  Kenneth A. Kaufman,et al.  A Method for Reasoning with Structured and Continuous Attributes in the INLEN-2 Multistrategy Knowledge Discovery System , 1996, KDD.

[15]  Howard L. Bleich,et al.  Technical Milestone: Medical Subject Headings Used to Search the Biomedical Literature , 2001, J. Am. Medical Informatics Assoc..

[16]  Dimitrios Mitsouras,et al.  Natural Language Processing Technologies in Radiology Research and Clinical Applications. , 2016, Radiographics : a review publication of the Radiological Society of North America, Inc.

[17]  Yoshua Bengio,et al.  Knowledge Matters: Importance of Prior Information for Optimization , 2013, J. Mach. Learn. Res..

[18]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[19]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[20]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[21]  O Bodenreider,et al.  Biomedical ontologies in action: role in knowledge management, data integration and decision support. , 2008, Yearbook of medical informatics.

[22]  Robert Givan,et al.  Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[23]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[24]  Ryszard S. Michalski,et al.  Data-Driven Constructive Induction , 1998, IEEE Intell. Syst..

[25]  S. Taneja Re: impact of age and comorbidities on long-term survival of patients with high-risk prostate cancer treated with radical prostatectomy: a multi-institutional competing-risks analysis. , 2013, Journal of Urology.

[26]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[27]  Giancarlo Ferrigno,et al.  Automatic classification of epilepsy types using ontology-based and genetics-based machine learning , 2014, Artif. Intell. Medicine.

[28]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[29]  Olivier Bodenreider,et al.  The NLM Value Set Authority Center , 2013, MedInfo.

[30]  Jeffrey A. Johnson,et al.  Health-related quality of life and healthcare utilization in multimorbidity: results of a cross-sectional survey , 2013, Quality of Life Research.

[31]  Clement J. McDonald,et al.  Development of the Logical Observation Identifier Names and Codes (LOINC) vocabulary. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[32]  Daniel Kudenko,et al.  Machine Learning and Inductive Logic Programming for Multi-agent Systems , 2001, EASSS.

[33]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[34]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[35]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[36]  C. Lindberg The Unified Medical Language System (UMLS) of the National Library of Medicine. , 1990, Journal.

[37]  Ryszard S. Michalski,et al.  Reasoning with unknown, not-applicable and irrelevant meta-values in concept learning and pattern discovery , 2011, Journal of Intelligent Information Systems.

[38]  C. Steiner,et al.  Comorbidity measures for use with administrative data. , 1998, Medical care.

[39]  Clement J. McDonald,et al.  The UMLS-CORE project: a study of the problem list terminologies used in large healthcare institutions , 2010, J. Am. Medical Informatics Assoc..

[40]  L. Manchikanti,et al.  ICD-10: History and Context , 2016, American Journal of Neuroradiology.

[41]  William Yang Wang,et al.  Structure Learning via Parameter Learning , 2014, CIKM.

[42]  Janusz Wojtusiak,et al.  Recent Advances in AQ21 Rule Learning System for Healthcare Data , 2012, AMIA.

[43]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[44]  Rong Xu,et al.  A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[45]  Larry Wright,et al.  Overview and Utilization of the NCI Thesaurus , 2004, Comparative and functional genomics.