Learning semantic histopathological representation for basal cell carcinoma classification

Diagnosis of a histopathology glass slide is a complex process that involves accurate recognition of several structures, their function in the tissue and their relation with other structures. The way in which the pathologist represents the image content and the relations between those objects yields a better and accurate diagnoses. Therefore, an appropriate semantic representation of the image content will be useful in several analysis tasks such as cancer classification, tissue retrieval and histopahological image analysis, among others. Nevertheless, to automatically recognize those structures and extract their inner semantic meaning are still very challenging tasks. In this paper we introduce a new semantic representation that allows to describe histopathological concepts suitable for classification. The approach herein identify local concepts using a dictionary learning approach, i.e., the algorithm learns the most representative atoms from a set of random sampled patches, and then models the spatial relations among them by counting the co-occurrence between atoms, while penalizing the spatial distance. The proposed approach was compared with a bag-of-features representation in a tissue classification task. For this purpose, 240 histological microscopical fields of view, 24 per tissue class, were collected. Those images fed a Support Vector Machine classifier per class, using 120 images as train set and the remaining ones for testing, maintaining the same proportion of each concept in the train and test sets. The obtained classification results, averaged from 100 random partitions of training and test sets, shows that our approach is more sensitive in average than the bag-of-features representation in almost 6%.