Classification of Paediatric Inflammatory Bowel Disease using Machine Learning

Paediatric inflammatory bowel disease (PIBD), comprising Crohn’s disease (CD), ulcerative colitis (UC) and inflammatory bowel disease unclassified (IBDU) is a complex and multifactorial condition with increasing incidence. An accurate diagnosis of PIBD is necessary for a prompt and effective treatment. This study utilises machine learning (ML) to classify disease using endoscopic and histological data for 287 children diagnosed with PIBD. Data were used to develop, train, test and validate a ML model to classify disease subtype. Unsupervised models revealed overlap of CD/UC with broad clustering but no clear subtype delineation, whereas hierarchical clustering identified four novel subgroups characterised by differing colonic involvement. Three supervised ML models were developed utilising endoscopic data only, histological only and combined endoscopic/histological data yielding classification accuracy of 71.0%, 76.9% and 82.7% respectively. The optimal combined model was tested on a statistically independent cohort of 48 PIBD patients from the same clinic, accurately classifying 83.3% of patients. This study employs mathematical modelling of endoscopic and histological data to aid diagnostic accuracy. While unsupervised modelling categorises patients into four subgroups, supervised approaches confirm the need of both endoscopic and histological evidence for an accurate diagnosis. Overall, this paper provides a blueprint for ML use with clinical data.

[1]  W. Kannel,et al.  Precursors of sudden coronary death. Factors related to the incidence of sudden death. , 1975, Circulation.

[2]  H. Hakonarson,et al.  Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. , 2013, American journal of human genetics.

[3]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[4]  David C. Wilson,et al.  Rising incidence of pediatric inflammatory bowel disease in Scotland* , 2012, Inflammatory bowel diseases.

[5]  J. Fell,et al.  Disease Phenotype at Diagnosis in Pediatric Crohn's Disease: 5-year Analyses of the EUROKIDS Registry , 2013, Inflammatory bowel diseases.

[6]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[7]  G. Veres,et al.  ESPGHAN Revised Porto Criteria for the Diagnosis of Inflammatory Bowel Disease in Children and Adolescents , 2013, Journal of pediatric gastroenterology and nutrition.

[8]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[9]  W. Tapper,et al.  Support Vector Machine Classifier for Estrogen Receptor Positive and Negative Early-Onset Breast Cancer , 2013, PloS one.

[10]  M. Vatn,et al.  Change in the extent of colonoscopic and histological involvement in ulcerative colitis over time , 1999, American Journal of Gastroenterology.

[11]  David C Wilson,et al.  Pediatric modification of the Montreal classification for inflammatory bowel disease: The Paris classification , 2011, Inflammatory bowel diseases.

[12]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[13]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  S. Ennis,et al.  Endoscopic Versus Histological Disease Extent at Presentation of Paediatric Inflammatory Bowel Disease , 2016, Journal of pediatric gastroenterology and nutrition.

[16]  Antonio Criminisi Machine learning for medical images analysis , 2016, Medical Image Anal..

[17]  Adam Robinson,et al.  Molecular classification of Crohn's disease reveals two clinically relevant subtypes , 2016, Gut.

[18]  D. Podolsky,et al.  Inflammatory bowel disease (1) , 1991, The New England journal of medicine.

[19]  Kurt Miller,et al.  Artificial neural networks and prostate cancer—tools for diagnosis and management , 2013, Nature Reviews Urology.

[20]  M. Heyman,et al.  Addition of Histology to the Paris Classification of Pediatric Crohn Disease Alters Classification of Disease Location , 2016, Journal of pediatric gastroenterology and nutrition.

[21]  Christopher. Simons,et al.  Machine learning with Python , 2017 .

[22]  Mark Lutz,et al.  Learning Python , 1999 .

[23]  R. Pounder,et al.  Early mucosal changes in Crohn's disease. , 1993, Gut.

[24]  D. Turner Microscopic Assessment in Inflammatory Bowel Disease: The More the Merrier? , 2016, Journal of pediatric gastroenterology and nutrition.

[25]  S. Ennis,et al.  Rising incidence of paediatric inflammatory bowel disease (PIBD) in Wessex, Southern England , 2014, Archives of Disease in Childhood.

[26]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[27]  J. Doyle,et al.  Precursors of Sudden Coronary Death , 2005 .

[28]  R. Altman,et al.  A new disease-specific machine learning approach for the prediction of cancer-causing missense variants. , 2011, Genomics.

[29]  Zhiyong Lu,et al.  A survey of current trends in computational drug repositioning , 2016, Briefings Bioinform..

[30]  Barmak Modrek,et al.  T-helper type 2-driven inflammation defines major subphenotypes of asthma. , 2009, American journal of respiratory and critical care medicine.

[31]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[32]  Gustavo Henrique Goulart Trossini,et al.  Use of machine learning approaches for novel drug discovery , 2016, Expert opinion on drug discovery.

[33]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[34]  D. Turner,et al.  Atypical Disease Phenotypes in Pediatric Ulcerative Colitis: 5-year Analyses of the EUROKIDS Registry , 2012, Inflammatory bowel diseases.

[35]  E. Szigethy,et al.  Inflammatory bowel disease. , 2011, Pediatric clinics of North America.

[36]  L. Croner,et al.  Combined Serological, Genetic, and Inflammatory Markers Differentiate Non-IBD, Crohn's Disease, and Ulcerative Colitis Patients , 2013, Inflammatory bowel diseases.