Mixed Dish Recognition With Contextual Relation and Domain Alignment

Mixed dish is a food category that contains different dishes mixed in one plate, and is popular in Eastern and Southeast Asia. Recognizing the individual dishes in a mixed dish image is important for health related applications, e.g. to calculate the nutrition values of the dish. However, most existing methods that focus on single dish classification are not applicable to the recognition of mixed dish images. The main challenge of mixed dish recognition comes from three aspects: a wide range of dish types, the complex dish combination with severe overlap between different dishes and the large visual variances of same dish type caused by different cooking/cutting methods applied in different canteens. In order to tackle these problems, we propose the contextual relation network that encodes the implicit and explicit contextual relations among multiple dishes from region-level features and label-level co-occurrence respectively. Besides, to address the visual variances of dish instances from different canteens, we introduce the domain adaption networks to align both local and global features, and eliminating domain gaps of dish features across different canteens. In addition, we collect a mixed dish image dataset containing 9254 mixed dish images from 6 canteens in Singapore. Extensive experiments on both our dataset and public one validate that our methods can achieve top performance for localizing and recognizing multiple dishes and solve the domain shift problem to a certain extent in mixed dish images.