Multi-Task Knowledge Distillation for Eye Disease Prediction

While accurate disease prediction from retinal fundus images is critical, collecting large amounts of high quality labeled training data to build such supervised models is difficult. Deep learning classifiers have led to high accuracy results across a wide variety of medical imaging problems, but they need large amounts of labeled data. Given a fundus image, we aim to evaluate various solutions for learning deep neural classifiers using small labeled data for three tasks related to eye disease prediction: (T1) predicting one of the five broad categories – diabetic retinopathy, age-related macular degeneration, glaucoma, melanoma and normal, (T2) predicting one of the 320 fine-grained disease sub-categories, (T3) generating a textual diagnosis. The problem is challenging because of small data size, need for predictions across multiple tasks, handling image variations, and large number of hyper-parameter choices. Modeling the problem under a multi-task learning (MTL) setup, we investigate the contributions of each of the proposed tasks while dealing with a small amount of labeled data. Further, we suggest a novel MTL-based teacher ensemble method for knowledge distillation. On a dataset of 7212 labeled and 35854 unlabeled images across 3502 patients, our technique obtains ~83% accuracy, ~75% top-5 accuracy and ~48 BLEU for tasks T1, T2 and T3 respectively. Even with 15% training data, our method outperforms baselines by 8.1, 3.2 and 11.2 points for the three tasks respectively.

[1]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[4]  Bunyarit Uyyanonvara,et al.  An Ensemble Classification-Based Approach Applied to Retinal Blood Vessel Segmentation , 2012, IEEE Transactions on Biomedical Engineering.

[5]  Hassan Ghasemzadeh,et al.  Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.

[6]  Harsha L. Rao,et al.  Accuracy Of Ordinary Least Squares And Empirical Bayes Estimates Of Short Term Visual Field Progression Rates To Predict Long Term Outcomes In Glaucoma , 2012 .

[7]  J. W. Harbour Molecular prediction of time to metastasis from ocular melanoma fine needle aspirates , 2006 .

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Xiaochun Cao,et al.  Disc-Aware Ensemble Network for Glaucoma Screening From Fundus Image , 2018, IEEE Transactions on Medical Imaging.

[10]  T. Wong,et al.  Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. , 2014, Ophthalmology.

[11]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[12]  Langis Gagnon,et al.  Automatic visual quality assessment in optical fundus images , 2001 .

[13]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Aaron Y. Lee,et al.  Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration , 2016, bioRxiv.

[15]  Jianzhong Wu,et al.  Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images , 2016, IEEE Transactions on Medical Imaging.

[16]  Jost B. Jonas,et al.  Updates on the Epidemiology of Age‐Related Macular Degeneration , 2017, Asia-Pacific journal of ophthalmology.

[17]  Hamid Jafarkhani,et al.  A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI , 2015, Medical Image Anal..

[18]  Yan Xu,et al.  Deep learning of feature representation with multiple instance learning for medical image analysis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ronald M. Summers,et al.  Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique , 2016 .

[21]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[22]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Thomas Pock,et al.  Learning a variational network for reconstruction of accelerated MRI data , 2017, Magnetic resonance in medicine.

[24]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[25]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[29]  Christopher Bowd,et al.  Glaucomatous Patterns in Frequency Doubling Technology (FDT) Perimetry Data Identified by Unsupervised Machine Learning Classifiers , 2014, PloS one.

[30]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[31]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[32]  Mauro Giacomini,et al.  Combining macula clinical signs and patient characteristics for age-related macular degeneration diagnosis: a machine learning approach , 2015, BMC Ophthalmology.

[33]  Arun D. Singh,et al.  Uveal melanoma: trends in incidence, treatment, and survival. , 2011, Ophthalmology.

[34]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[35]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[36]  Manish Gupta,et al.  Predicting Post-operative Visual Acuity for LASIK Surgeries , 2016, PAKDD.

[37]  Leonardo Torquetti,et al.  Predictors of Clinical Outcomes after Intrastromal Corneal Ring Segments Implantation , 2012 .

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Dinggang Shen,et al.  3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients , 2016, MICCAI.

[40]  Andrew Janowczyk,et al.  Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases , 2016, Journal of pathology informatics.