论文信息 - Food det: Detecting foods in refrigerator with supervised transformer network

Food det: Detecting foods in refrigerator with supervised transformer network

Abstract Most of existing methods mainly focus on the food image recognition which assumes that one food image contains only one food item. However, in this paper, we present a system to detect a diversity of foods in refrigerator where multiple food items may exist. In view of the refrigerator environment, we propose a food detection framework based on the supervised transformer network. More specifically, the supervised transformer network, dotted as RectNet, is first proposed to automatically select the irregular food regions and transform them to the frontal views. Then, based on the rectified food images, we further propose an end-to-end detection network that predicts the categories and locations of food items. The proposed detection network, called Lite Fully Convolutional Network (LiteFCN), is evolved from the advanced object detection algorithm Faster R-CNN while several significant improvements are tailored to achieve a higher accuracy and keep inference time efficiency. To validate the effectiveness of each component of our method, we build a real-world refrigerator dataset with 80 classes. Extensive experiments demonstrate that our methods achieve the state-of-the-art results, which improves the baseline by a large margin, e.g., 3–5% in terms of F-measure. We also show that the proposed detection network achieve a competitive result on the public PASCAL VOC2007 dataset, which outperforms the Faster R-CNN by 2.3% with a higher speed.

[1] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Marios Anthimopoulos,et al. Dish Detection and Segmentation for Dietary Assessment on Smartphones , 2015, ICIAP Workshops.

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Huimin Ma,et al. Boundary-aware box refinement for object proposal generation , 2017, Neurocomputing.

[7] Gian Luca Foresti,et al. Wide-Slice Residual Networks for Food Recognition , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[9] Wataru Shimoda,et al. CNN-Based Food Image Segmentation Without Pixel-Wise Annotation , 2015, ICIAP Workshops.

[10] Hanqing Lu,et al. CoupleNet: Coupling Global Structure with Local Parts for Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Kiyoharu Aizawa,et al. Highly Accurate Food/Non-Food Image Classification Based on a Deep Convolutional Neural Network , 2015, ICIAP Workshops.

[12] Max Welling,et al. Transformation Properties of Learned Visual Representations , 2014, ICLR.

[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14] Rolf Adams,et al. Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[15] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Gang Hua,et al. Supervised Transformer Network for Efficient Face Detection , 2016, ECCV.

[17] Yurong Liu,et al. A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[18] Makoto Ogawa,et al. Food Detection and Recognition Using Convolutional Neural Network , 2014, ACM Multimedia.

[19] Kiyoharu Aizawa,et al. Food Balance Estimation by Using Personal Dietary Tendencies in a Multimedia Food Log , 2013, IEEE Transactions on Multimedia.

[20] Xiangyu Zhang,et al. Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Chunhua Shen,et al. Pushing the Limits of Deep CNNs for Pedestrian Detection , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Mei Chen,et al. Food recognition using statistics of pairwise local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Wataru Shimoda,et al. Foodness Proposal for Multiple Food Detection by Training of Single Food Images , 2016, MADiMa @ ACM Multimedia.

[26] Sergio Guadarrama,et al. Im2Calories: Towards an Automated Mobile Vision Food Diary , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27] Gregory D. Abowd,et al. Leveraging Context to Support Automated Food Recognition in Restaurants , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[28] Kiyoharu Aizawa,et al. Personalized Classifier for Food Image Recognition , 2018, IEEE Transactions on Multimedia.

[29] Hanqing Lu,et al. Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection , 2016, ACCV.

[30] Abhinav Gupta,et al. A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32] Keiji Yanai,et al. Recognition of Multiple-Food Images by Detecting Candidate Regions , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[33] Juan Antonio Álvarez,et al. Evaluation of deep neural networks for traffic sign detection systems , 2018, Neurocomputing.

[34] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[35] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[36] Georg Waltner,et al. Personalized Dietary Self-Management Using Mobile Vision-Based Assistance , 2017, ICIAP Workshops.

[37] Petia Radeva,et al. Food Ingredients Recognition Through Multi-label Learning , 2017, ICIAP Workshops.

[38] Petia Radeva,et al. Simultaneous food localization and recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[39] Luis Herranz,et al. Modeling Restaurant Context for Food Recognition , 2017, IEEE Transactions on Multimedia.

[40] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[43] Giovanni Maria Farinella,et al. A Benchmark Dataset to Study the Representation of Food Images , 2014, ECCV Workshops.

[44] Kiyoung Choi,et al. Deep neural networks with weighted spikes , 2018, Neurocomputing.

[45] Shuang Wang,et al. Geolocalized Modeling for Dish Recognition , 2015, IEEE Transactions on Multimedia.

[46] Andrea Vedaldi,et al. Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[47] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[48] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[49] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50] Keiji Yanai,et al. Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation , 2014, ECCV Workshops.