Central reading of ulcerative colitis clinical trial videos using neural networks.

BACKGROUND AND AIMS Endoscopic disease activity scoring in ulcerative colitis (UC) is useful in clinical practice but infrequently done. It is required in clinical trials, where it is expensive and slow because human central readers (CR) are needed. A machine learning algorithm (MLA) automating the process could elevate clinical care and facilitate clinical research. Prior work using single-institution databases and endoscopic still images has been promising. METHODS 795 full-length endoscopy videos (FLEV) were prospectively collected from a phase 2 trial of mirikizumab with 249 patients from 14 countries, totaling 19.5 million image frames. Expert CRs assigned each FLEV one endoscopic Mayo score (eMS) and one UC endoscopic index of severity (UCEIS) score. Initially, video data were cleaned, and abnormality features extracted using convolutional neural networks. Subsequently, a recurrent neural network (RNN) was trained on the features to predict eMS and UCEIS from individual full-length endoscopy videos. RESULTS The primary metric to assess the performance of the RNN model was quadratic weighted kappa (QWK) comparing the agreement of the machine-read endoscopy score with the human central reader score. QWK progressively penalizes disagreements that exceed one level. The model's agreement metric was excellent with QWK of 0.844 (95% CI, 0.787-0.901) for eMS and 0.855 (95% CI, 0.80-0.91) for UCEIS. CONCLUSION We found that a deep learning algorithm can be trained to predict levels of ulcerative colitis severity from full-length endoscopy videos. Our data set was prospectively collected in a multinational clinical trial, videos rather than still images were used, UCEIS and eMS were reported, and MLA performance metrics met or exceeded those previously published for UC severity scores.

[1]  B. Sands,et al.  Endoscopy and central reading in inflammatory bowel disease clinical trials: achievements, challenges and future developments , 2020, Gut.

[2]  Yuichi Mori,et al.  Machine learning in GI endoscopy: practical guidance in how to interpret a novel field , 2020, Gut.

[3]  J. Mongan,et al.  Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. , 2020, Radiology. Artificial intelligence.

[4]  K. Ohtsuka,et al.  Development and Validation of a Deep Neural Network for Accurate Evaluation of Endoscopic Images From Patients with Ulcerative Colitis. , 2020, Gastroenterology.

[5]  T. Hibi,et al.  Efficacy and Safety of Mirikizumab in a Randomized Phase 2 Study of Patients With Ulcerative Colitis. , 2020, Gastroenterology.

[6]  B. Jia,et al.  OP28 Gene expression (GE) values in a phase 2 trial of mirikizumab in ulcerative colitis (UC) correlate better with histopathology (HP) than endoscopy (EN) and Mayo scores , 2020 .

[7]  I. Sechopoulos,et al.  Stand-alone artificial intelligence - The future of breast cancer screening? , 2020, Breast.

[8]  Andrew Q. Ninh,et al.  Prediction of Polyp Pathology Using Convolutional Neural Networks Achieves "Resect and Discard" Thresholds. , 2019, The American journal of gastroenterology.

[9]  K. Gottlieb,et al.  Sequentially Determined Measures of Interobserver Agreement (Kappa) in Clinical Trials May Vary Independent of Changes in Observer Performance , 2019, Therapeutic innovation & regulatory science.

[10]  S. Vermeire,et al.  Comparison of the EMA and FDA Guidelines on Ulcerative Colitis Drug Development. , 2019, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[11]  Andrew Q. Ninh,et al.  Tu1932 AUTOMATED INSERTION TIME, CECAL INTUBATION, AND WITHDRAWAL TIME DURING LIVE COLONOSCOPY USING CONVOLUTIONAL NEURAL NETWORKS - A VIDEO VALIDATION STUDY , 2019, Gastrointestinal Endoscopy.

[12]  Andrew Q. Ninh,et al.  299 – Video Validation of Small Bowel Convolutional Neural Networks (CNNS) in Identification of Anatomical Landmarks and Mucosal Abnormalities in Video Capsule Endoscopy , 2019, Gastroenterology.

[13]  Ryan W. Stidham,et al.  Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis , 2019, JAMA network open.

[14]  Andrew Q. Ninh,et al.  Automated Documentation of Multiple Colonoscopy Quality Measures in Real-Time with Convolutional Neural Networks , 2018, American Journal of Gastroenterology.

[15]  Andrew Q. Ninh,et al.  Sa1940 UNAMBIGUOUS REAL-TIME SCORING OF BOWEL PREPARATION USING ARTIFICIAL INTELLIGENCE , 2018, Gastrointestinal Endoscopy.

[16]  S. Travis,et al.  Central Endoscopy Reading in Inflammatory Bowel Diseases. , 2016, Journal of Crohn's & colitis.

[17]  S. Osawa,et al.  The Ulcerative Colitis Endoscopic Index of Severity More Accurately Reflects Clinical Outcomes and Long-term Prognosis than the Mayo Endoscopic Score. , 2016, Journal of Crohn's & colitis.

[18]  S. Travis,et al.  Central Reading of Endoscopy Endpoints in Inflammatory Bowel Disease Trials , 2015, Inflammatory bowel diseases.

[19]  K. Gottlieb,et al.  The 2 + 1 paradigm: an efficient algorithm for central reading of Mayo endoscopic subscores in global multicenter phase 3 ulcerative colitis clinical trials , 2015, Gastroenterology report.

[20]  T. Murdoch,et al.  Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE): Determining Therapeutic Goals for Treat-to-Target , 2015, The American Journal of Gastroenterology.

[21]  Klaus Gottlieb,et al.  Voting for Image Scoring and Assessment (VISA) - theory and application of a 2 + 1 reader algorithm to improve accuracy of imaging endpoints in clinical trials , 2015, BMC Medical Imaging.

[22]  D. Altman,et al.  Translational Gastroenterology , 2011 .

[23]  Gheorghe Doros,et al.  The Boston bowel preparation scale: a valid and reliable instrument for colonoscopy-oriented research. , 2007, Gastrointestinal endoscopy.

[24]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[25]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[26]  H. Brenner,et al.  Dependence of Weighted Kappa Coefficients on the Number of Categories , 1996, Epidemiology.

[27]  W. Tremaine,et al.  Coated oral 5-aminosalicylic acid therapy for mildly to moderately active ulcerative colitis. A randomized study. , 1987, The New England journal of medicine.