Experiment I: category prediction In this experiment, we randomly select 2K samples from 7 categories (boat, bus, f1car, tank, train, ufo, and van) and feed them to a pretrained CNN model, specifically Alexnet. Having fc7 and pool5 representations of selected samples ready, we use the t-SNE algorithm to reduce their dimensionality to 2D. In addition, 20K images are randomly selected from all 7 categories and the network is fine-tuned on the provided data for object categorization. The same procedure is carried out on the fine-tuned (FT) network. Fig. 1 depicts the results. Our results in Fig. 1 show that fc7 representation works remarkably well at recognizing object level categories as they are mutually linearly separable after fine-tuning the network. Furthermore, pool5 representation does not contain discriminative information between object categories compared to fc7. This result is in alignment with Bakry et al., [1]. Fig. 1 also demonstrates the effect of fine-tuning on feature spaces. The distributions of samples for different categories tend to become very compact and concentrated after fine-tuning. Notice that fine-tuning does not add more discriminative power to the pool5 representation.