SlideSeg: a Python module for the creation of annotated image repositories from whole slide images

Machine learning methods are being widely used in medicine to aid cancer diagnosis and detection. In the area of digital pathology, prediction heat maps produced by convolutional neural networks (CNN) have already exceeded the performance of a trained pathologist with no time constraints. To train deep learning networks, large datasets of accurately labeled ground truth data are required; however, whole slide images are often on the scale of 10+ gigapixels when digitized at 40X magnification, contain multiple magnification levels, and have unstandardized formats. Due to these characteristics, traditional techniques for the production of training and validation data cannot be used, resulting in the limited availability of annotated datasets. This research presents a Python module and method to rapidly produce accurately annotated image patches from whole slide images. This module is built on OpenCV, an open source computer vision library, OpenSlide, an open source library for reading virtual slide images, and NumPy, a library for scientific computing with Python. These Python scripts successfully produce 'ground truth' image patches and will help transfer advances in research laboratories into clinical application by addressing many of the challenges associated with the development of annotated datasets for machine learning in histopathology.

[1]  Karl Rohr,et al.  Automatic breast cancer grading in lymph nodes using a deep neural network , 2017, ArXiv.

[2]  E. Petricoin,et al.  SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. , 2004, Current opinion in biotechnology.

[3]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[4]  George Lee,et al.  Image analysis and machine learning in digital pathology: Challenges and opportunities , 2016, Medical Image Anal..

[5]  Navid Farahani,et al.  whole slide imaging in pathology: advantages, limitations, and emerging perspectives , 2015 .

[6]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.

[7]  Andrew Evans,et al.  Digital imaging in pathology: whole-slide imaging and beyond. , 2013, Annual review of pathology.

[8]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[9]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[10]  G Coppini,et al.  Detection of single and clustered microcalcifications in mammograms using fractals models and neural networks. , 2004, Medical engineering & physics.

[11]  Fabio A. González,et al.  Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks , 2014, Medical Imaging.

[12]  Steven H. Hinrichs,et al.  Application of whole slide image markup and annotation for pathologist knowledge capture , 2013, Journal of pathology informatics.