Weakly Supervised Cross-Domain Mixed Dish Detection With Mean-Teacher

Mixed dish, which mixes different types of dishes in one plate, is a popular kind of food in East and Southeast Asia. Identifying the dish type in the mixed dish is essential for dietary tracking, which gains increasing research attention recently. Nevertheless, mixed dish detection is a challenging task because of large visual variances among dishes in different canteens, which is known as the domain shifting problem. Since collecting and annotating sufficient training samples in each canteen for model training is difficult, a more practical way is developing detection models that can adapt quickly to cross-canteen mixed-dish detection with less supervision information. To this end, we propose a novel framework called Weakly-supervised Mean Teacher Network (WMT-Net) that addresses this specific detection task in a weakly supervised manner, where bounding box annotations are not required in the target domain. The proposed WMT-Net constructs Mean Teacher learning by maintaining the image-level consistency between teacher and student modules. Specifically, WMT-Net firstly learns instance-level information from the source dataset in a fully supervised fashion for the student model. Then the whole architecture is optimized with weakly supervised learning: 1) weakly supervised training in student model to reduce the domain gap in global semantics between source data and target data, 2) image-level consistency to align the image-level predictions between teacher model and student model. Experimental results on mixed-dish dataset show that even the proposed WMT-Net is trained in a weakly supervised fashion on the target domain, the performances attained by WMT-Net are very close to the model trained in a fully supervised fashion, which verify the effectiveness of WMT-Net. In addition, the proposed WMT-Net also achieves 44.6% mAP on Pascal VOC to Clipart cross-domain detection, which improves 7.2% mAP compared with the state-of-the-arts method and further demonstrates its generalization capabilities.

[1]  Kiyoharu Aizawa,et al.  Food Image Recognition by Personalized Classifier , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Luc Van Gool,et al.  Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Neel Joshi,et al.  Menu-Match: Restaurant-Specific Food Logging from Images , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[5]  Keiji Yanai,et al.  Simultaneous estimation of food categories and calories with multi-task CNN , 2017, 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA).

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Chong-Wah Ngo,et al.  Mixed Dish Recognition through Multi-Label Learning , 2019, CEA@ICMR.

[8]  Keiji Yanai,et al.  Multi-task learning of dish detection and calorie estimation , 2018, MADiMa@IJCAI.

[9]  Xin Chen,et al.  ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition , 2017, ArXiv.

[10]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[11]  Kate Saenko,et al.  Strong-Weak Distribution Alignment for Adaptive Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[13]  Petia Radeva,et al.  Food Ingredients Recognition Through Multi-label Learning , 2017, ICIAP Workshops.

[14]  Petia Radeva,et al.  Simultaneous food localization and recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[15]  Keiji Yanai,et al.  Food image recognition with deep convolutional features , 2014, UbiComp Adjunct.

[16]  Luis Herranz,et al.  Modeling Restaurant Context for Food Recognition , 2017, IEEE Transactions on Multimedia.

[18]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[20]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Kiyoharu Aizawa,et al.  Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Keiji Yanai,et al.  Estimating Food Calories for Multiple-Dish Food Photos , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[23]  Vinod Vokkarane,et al.  DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment , 2016, ICOST.

[24]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[25]  Kiyoharu Aizawa,et al.  Personalized Classifier for Food Image Recognition , 2018, IEEE Transactions on Multimedia.

[26]  Chong-Wah Ngo,et al.  Food Photo Recognition for Dietary Tracking: System and Experiment , 2018, MMM.

[27]  Keiji Yanai,et al.  Real-Time Mobile Food Recognition System , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Gregory D. Abowd,et al.  Leveraging Context to Support Automated Food Recognition in Restaurants , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[29]  Chong-Wah Ngo,et al.  Exploring Object Relation in Mean Teacher for Cross-Domain Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  John R. Smith,et al.  Snap, Eat, RepEat: A Food Recognition Engine for Dietary Logging , 2016, MADiMa @ ACM Multimedia.

[32]  Keiji Yanai,et al.  Image Recognition of 85 Food Categories by Feature Fusion , 2010, 2010 IEEE International Symposium on Multimedia.

[33]  Keiji Yanai,et al.  Recognition of Multiple-Food Images by Detecting Candidate Regions , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[34]  Beatriz Remeseiro,et al.  Grab, Pay, and Eat: Semantic Food Detection for Smart Restaurants , 2018, IEEE Transactions on Multimedia.

[35]  Chong-Wah Ngo,et al.  Deep-based Ingredient Recognition for Cooking Recipe Retrieval , 2016, ACM Multimedia.

[36]  Monica Mordonini,et al.  Food Image Recognition Using Very Deep Convolutional Networks , 2016, MADiMa @ ACM Multimedia.

[37]  Shervin Shirmohammadi,et al.  Mobile Multi-Food Recognition Using Deep Learning , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[38]  Wataru Shimoda,et al.  Foodness Proposal for Multiple Food Detection by Training of Single Food Images , 2016, MADiMa @ ACM Multimedia.

[39]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[40]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Shuang Wang,et al.  Geolocalized Modeling for Dish Recognition , 2015, IEEE Transactions on Multimedia.

[42]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[43]  Tat-Seng Chua,et al.  Mixed-dish Recognition with Contextual Relation Networks , 2019, ACM Multimedia.

[44]  Andrew Gordon Wilson,et al.  Improving Consistency-Based Semi-Supervised Learning with Weight Averaging , 2018, ArXiv.

[45]  Gian Luca Foresti,et al.  Wide-Slice Residual Networks for Food Recognition , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.