Training classifiers with limited data using the Radon cumulative distribution transform

Our purpose in this study is to investigate whether a recently introduced image transform, denoted as the Radon cumulative distribution transform (R-CDT), can be used as a viable preprocessing step for augmenting the robustness of training end-to-end systems with fewer training samples. In order to assess the ability of the R-CDT to perform this aim, we identified a standard machine learning dataset, MNIST, and a preliminary dataset comprised of liver cell nuclei images derived from one of two tissue types: benign or malignant tumor lesions. We separated the data into training and testing sets with 20% of the total data used for testing across all training set size conditions. To simulate a range of limited size of training examples, we randomly generated data subsets ranging in size from 80% to 0.8% of the total dataset size to be used for training. Linear classification algorithms were implemented via logistic regression and a support vector machine model with a linear kernel on both the raw images and images transformed via the R-CDT. Additionaly, non-linear classification accuracies were assessed via comparing the R-CDT paired with a shallow CNN and using a deep CNN to classify images. Results indicate that classification in Radon cumulative distribution transform space outperforms classification in image space in conditions of limited data, as one is likely to see in medical imaging.