Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for the Diagnosis of Skin Lesions

This report describes our system towards the automatic diagnosis of skin lesions. We aim to incorporate the expert knowledge of dermatologists into the well known framework of Convolutional Neural Networks (CNN), which have shown impressive performance in many visual recognition tasks. In particular, we have designed several networks providing lesion area identification, lesion segmentation into structural patterns and final diagnosis of clinical cases. Furthermore, novel blocks for CNNs have been designed to integrate this information with the diagnosis processing pipeline. Figure 1: Main processing pipeline of our Automatic Diagnosis System 1 General description of the system The main pipeline of our system is depicted in Fig. 1. It comprises the following steps: 1. For each clinical case c, a dermoscopic image Xc feeds a Lesion Segmentation Network that generates a binary mask Mc outlining the area of the image which corresponds to the lesion. The description of this module is given in section 2. 2. Each clinical case c, which is now defined by the image-mask couple {Xc,Mc}, goes through the Data Augmentation Module. This module aims to extend the initial visual support of the lesion by generating new views v corresponding to different rotations and cropped areas. Hence, the output of this module is an extended set of images X̃ v related to the lesion. Section 3 provides a detailed description of this data augmentation process. ar X iv :1 70 3. 01 97 6v 1 [ cs .C V ] 6 M ar 2 01 7 3. The next step in the process is the Structure Segmentation Network. It aims to segment each view of the lesion X̃v into a set of eight global and local structures that have turned to be very important for dermatologists in their daily diagnosis. Examples of these structures are dots/globules, regression areas, streaks, etc. Hence, the output of this system is a set of 8 segmentation maps S vs, s = 1...8, each one associated to a particular structure s of interest. This module is introduced in section 4. 4. Finally, the augmented set {X̃ v , S vs} is passed to the Diagnosis Network, which is in charge of providing the final diagnosis Yc for the clinical case. The description of this network can be found in section 5. 2 Lesion Segmentation Network The Lesion Segmentation Network has been developed by learning a Fully Convolutional Network (FCN) [Shelhamer et al., 2016]. FCNs have achieved state-of-the-art results on the task of semantic image segmentation in general-content, as demonstrated in the PASCAL VOC Segmentation [Everingham et al., 2015]. In order to train a network for our particular task of lesion/skin segmentation, we have used the training set for the lesion segmentation task in the 2017 ISBI challenge. Let us note that the goal of this module is not to generate very accurate segmentation maps of a lesion, but to broadly identify the area of the image that corresponds to the lesion, giving place to a binary map Mc for each clinical case. Figure 2: Example of a rotated and cropped view of a lesion and its Normalized Polar Coordinates. (Left) View of the lession (Middle) Normalize Ratio (Right) Angle 3 Data Augmentation Module and Normalized Polar Coordinates It is well known that data augmentation notably boosts the performance of deep neural networks, mainly when the amount of training data is limited. Among all the potential image variations and artifacts, invariance to orientation is probably the main requirement of our method, as dermatologists do not follow a specific protocol during the capture of a lesion. Other more complex geometric transformations such as affine or projective transforms are less interesting here as the dermatoscope is normally placed just over and orthogonally to the lesion surface. The particular process of data augmentation is described next: 1. First, starting from the pair {Xc,Mc}, we generate a set of rotated versions. 2. As rotating an image without losing any visual information requires incorporating new areas which were not present in the original view, we find and crop the largest inner rectangle ensuring that all pixels belong to the original image. 3. Finally, as our sub subsequent CNNs (Structure Segmentation and Diagnosis) require square input images of 256x256 pixels, we finally perform various squared crops which are in turn re-sized to the required dimensions. Considering the aforementioned rotations and crops, for each given clinical case c, we generate an augmented set of 24 images, represented by a tensor X̃ v ∈ R256×256×3, with v = 1...24. In addition, for each generated view X̃ v , we compute the Normalized Polar Coordinates from the lesion mask. The goal of this new alternative coordinates is to support subsequent processing blocks by providing invariance against shifts, rotations, changes in size and even irregular shapes of the lesions. To do so, we transform pixel Cartesian coordinates (xi, yi) into normalized polar coordinates (ρi, θi), where rhoi ∈ [0, 1] and θi ∈ [0, 2π) stand for the normalized ratio and angle, respectively. The process to compute this transformation is as follows: first, the mask of the lesion is approximated by an ellipse with the same second-order moments. Then, we learn the affine matrix that transforms

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.