Are We Ready for Accurate and Unbiased Fine-Grained Vehicle Classification in Realistic Environments?

Fine-grained vehicle classification from images, also known as Vehicle Make and Model Recognition (VMMR), has become an important research topic in the last years, with a growing number of scientific contributions in multiple application areas, such as autonomous vehicles, surveillance systems, traffic monitoring and management, among others. Recent techniques based on deep learning have proven to be very effective in addressing this problem. So effective that, based on the state-of-the-art results (above 95% accuracy), it would seem that the problem is practically solved. However, our main hypothesis is that the existing datasets to date have limited variability, which precludes good and unbiased generalisation of the models trained with them. In particular, it is observed that the test datasets are very similar in nature to those used for training and validation which makes these benchmarks prone to dataset bias and to overfitting. When these systems are tested with more challenging data or data from different datasets performance degrades considerably. In this paper, on the one hand, we evaluate state-of-the-art deep learning models to perform fine-grained vehicle classification and explore multiple training techniques, such as curriculum learning or weighted losses, to mitigate the bias between different makes and models and to assess the limits of current approaches. On the other hand, we analyse the existing datasets, present an additional dataset from a challenging scenario, and merge all the data into a cross-dataset that includes common samples and classes from the existing datasets. In this way, we can evaluate geographical, make and model biases, and performance and generalisation capabilities from a more realistic perspective. The obtained results suggest that we are still far from accurate and unbiased vehicle make and model recognition in realistic traffic and driving scenarios.