Limited Number of Cases May Yield Generalizable Models, a Proof of Concept in Deep Learning for Colon Histology

Background: Little is known about the effect of a minimum number of slides required in generating image datasets used to build generalizable machine-learning (ML) models. In addition, the assumption within deep learning is that the increased number of training images will always enhance accuracy and that the initial validation accuracy of the models correlates well with their generalizability. In this pilot study, we have been able to test the above assumptions to gain a better understanding of such platforms, especially when data resources are limited. Methods: Using 10 colon histology slides (5 carcinoma and 5 benign), we were able to acquire 1000 partially overlapping images (Dataset A) that were then trained and tested on three convolutional neural networks (CNNs), ResNet50, AlexNet, and SqueezeNet, to build a large number of unique models for a simple task of classifying colon histopathology into benign and malignant. Different quantities of images (10–1000) from Dataset A were used to construct >200 unique CNN models whose performances were individually assessed. The performance of these models was initially assessed using 20% of Dataset A's images (not included in the training phase) to acquire their initial validation accuracy (internal accuracy) followed by their generalization accuracy on Dataset B (a very distinct secondary test set acquired from public domain online sources).Results: All CNNs showed similar peak internal accuracies (>97%) from the Dataset A test set. Peak accuracies for the external novel test set (Dataset B), an assessment of the ability to generalize, showed marked variation (ResNet50: 98%; AlexNet: 92%; and SqueezeNet: 80%). The models with the highest accuracy were not generated using the largest training sets. Further, a model's internal accuracy did not always correlate with its generalization accuracy. The results were obtained using an optimized number of cases and controls. Conclusions: Increasing the number of images in a training set does not always improve model accuracy, and significant numbers of cases may not always be needed for generalization, especially for simple tasks. Different CNNs reach peak accuracy with different training set sizes. Further studies are required to evaluate the above findings in more complex ML models prior to using such ancillary tools in clinical settings.

[1]  Andrew Janowczyk,et al.  Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases , 2016, Journal of pathology informatics.

[2]  Lydia P. Howell,et al.  Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods , 2019, Academic pathology.

[3]  Lingdao Sha,et al.  Multi-Field-of-View Deep Learning Model Predicts Nonsmall Cell Lung Cancer Programmed Death-Ligand 1 Status from Whole-Slide Hematoxylin and Eosin Images , 2019, Journal of pathology informatics.

[4]  Andrew H. Beck,et al.  Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer , 2017, JAMA.

[5]  Zhipeng Jia,et al.  Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features , 2017, BMC Bioinformatics.

[6]  Ilangko Balasingham,et al.  Automatic polyp frame screening using patch based combined feature and dictionary learning , 2018, Comput. Medical Imaging Graph..

[7]  Alexander Borowsky,et al.  Impact of pre‐analytical variables on deep learning accuracy in histopathology , 2019, Histopathology.

[8]  N. Razavian,et al.  Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning , 2018, Nature Medicine.

[9]  Saeed Hassanpour,et al.  Deep Learning for Classification of Colorectal Polyps on Whole-slide Images , 2017, Journal of pathology informatics.

[10]  A. Zandvakili,et al.  Differentiating Noninvasive Follicular Thyroid Neoplasm with Papillary-Like Nuclear Features from Classic Papillary Thyroid Carcinoma: Analysis of Cytomorphologic Descriptions Using a Novel Machine-Learning Approach , 2019, Journal of pathology informatics.

[11]  Olaf Hellwich,et al.  Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology , 2017, Comput. Medical Imaging Graph..

[12]  Ziba Gandomkar,et al.  MuDeRN: Multi-category classification of breast histopathological image using deep residual networks , 2018, Artif. Intell. Medicine.

[13]  J. Prewitt,et al.  Morphological Analysis of Cells and Chromosomes by Digital Computer , 1965, Methods of Information in Medicine.

[14]  Luca Maria Gambardella,et al.  Assessment of algorithms for mitosis detection in breast cancer histopathology images , 2014, Medical Image Anal..

[15]  George Lee,et al.  Image analysis and machine learning in digital pathology: Challenges and opportunities , 2016, Medical Image Anal..