Application of Machine Learning for Multicenter Learning

Advancements in radiation oncology are driving more specific, and thus improved, treatment opportunities. This creates challenges on the assessment of treatment options, as more information is needed to make an informed decision. One of the methods is to use machine-learning techniques to develop predictive models. Although prediction models, embedded in clinical decision support systems (CDSSs), are the foreseen solution, developing/training such prediction models requires large amounts of detailed patient information to reach decisive power. The amount of patients needed to train a reliable prediction model rapidly outgrows the numbers available in a single institution, hence the need for multicenter machinelearning. To be able to learn over multiple centers, several infrastructural prerequisites need to be addressed. First, data needs to be extracted from multiple source systems and represented using standardized terminologies, preferably including the semantics (the actual description) of the represented data. For research and model training purposes, this means that value representations (e.g. “m” or “f” indicating gender) need to be converted into standardized terms (the NCI Thesaurus codes C20197 or C16576, respectively), and that patient-identifiable information (e.g. name, institutional ID, address, etc.) needs to be removed or changed in a non-identifiable way. If datasets from different institutions use the same standardized terminology and data structure, data can be merged. Finally, after merging, prediction models can be learned on the complete dataset, in this chapter known as centralized learning.

[1]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Glenn Fung,et al.  Privacy-preserving cox regression for survival analysis , 2008, KDD.

[3]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[4]  Kajal T. Claypool,et al.  From Ontology to Relational Databases , 2004, ER.

[5]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[6]  P. Lambin,et al.  Stability of FDG-PET Radiomics features: An integrated analysis of test-retest and inter-observer variability , 2013, Acta oncologica.

[7]  Tok Wang Ling,et al.  Conceptual Modeling for Advanced Application Domains , 2004, Lecture Notes in Computer Science.

[8]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[9]  Chalapathy Neti,et al.  Rapid-learning system for cancer care. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  Richard Platt,et al.  Launching PCORnet, a national patient-centered clinical research network , 2014, Journal of the American Medical Informatics Association : JAMIA.

[11]  Mihály Héder,et al.  Semantic Web for the Working Ontologist, Second dition: Effective modeling in RDFS and OWL by Allemang Dean and Hendler James, Morgan Kaufmann, 384 pp., $55, ISBN 0-123-85965-4 , 2013, The Knowledge Engineering Review.

[12]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[13]  Jihoon Kim,et al.  Grid Binary LOgistic REgression (GLORE): building shared models without sharing data , 2012, J. Am. Medical Informatics Assoc..

[14]  Douglas MacFadden,et al.  Application of Information Technology The Shared Health Research Information Network ( SHRINE ) : A Prototype Federated Query Tool for Clinical Data Repositories , 2014 .

[15]  Dean Allemang,et al.  Semantic Web for the Working Ontologist - Effective Modeling in RDFS and OWL, Second Edition , 2011 .

[16]  Isaac S. Kohane,et al.  Architecture of the Open-source Clinical Research Chart from Informatics for Integrating Biology and the Bedside , 2007, AMIA.

[17]  N. D. de Keizer,et al.  Understanding Terminological Systems I: Terminology and Typology , 2000, Methods of Information in Medicine.

[18]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[19]  P. Lambin,et al.  Predicting outcomes in radiation oncology—multifactorial decision support systems , 2013, Nature Reviews Clinical Oncology.

[20]  Y. Ramamohan,et al.  A Study of Data Mining Tools in Knowledge Discovery Process , 2012 .

[21]  J. van Soest,et al.  An umbrella protocol for standardized data collection (SDC) in rectal cancer: a prospective uniform naming and procedure convention to support personalized medicine. , 2014, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  R. Doyle The American terrorist. , 2001, Scientific American.

[24]  Vincenzo Valentini,et al.  International data-sharing for radiotherapy research: an open-source based infrastructure for multicentric clinical data mining. , 2014, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[25]  P. Lambin,et al.  Learning methods in radiation oncology ‘Rapid Learning health care in oncology’ – An approach towards decision support systems enabling customised radiotherapy’ q , 2013 .

[26]  Prakash M. Nadkarni,et al.  The Greater Plains Collaborative: a PCORnet Clinical Research Data Network , 2014, J. Am. Medical Informatics Assoc..

[27]  D De Ruysscher,et al.  Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. , 2010, Medical physics.

[28]  P. Lambin,et al.  PD-0496: Multi-centric learning with a federated IT infrastructure: application to 2-year lung-cancer survival prediction , 2013 .

[29]  Philippe Lambin,et al.  Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial. , 2013, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[30]  Henry C. Chueh,et al.  A security architecture for query tools used to access large biomedical databases , 2002, AMIA.

[31]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[32]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[33]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[34]  Patrick Granton,et al.  Radiomics: extracting more information from medical images using advanced feature analysis. , 2012, European journal of cancer.