Bilinear CNN Models for Food Recognition

Due to the diversity of food types and the slight differences between different dishes, the genre of food images becomes a new challenge in the field of computer vision. To tackle this problem, recent efforts are focusing on designing hand-crafted features or extracting features automatically by using deep convolutional neural network. Although these methods have reported a series of success, their general architectures fail to capture the fine-grained features of similar dishes adequately. Inspired by the bilinear CNN models in the field of fine-grained classification, we have exploited such a similar structure in which two deep convolution networks are used as feature extractors and the outputs of them fused to obtain fine-grained features. These features are used to train the food classifier. We have conducted experiments on three publicly benchmark food datasets to evaluate the proposed architecture. The experiments exhibit that our method is comparable to the existing approaches.

[1]  Keiji Yanai,et al.  Real-Time Mobile Food Recognition System , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Gian Luca Foresti,et al.  Wide-Slice Residual Networks for Food Recognition , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Niki Martinel,et al.  A supervised extreme learning committee for food recognition , 2016, Comput. Vis. Image Underst..

[5]  Giovanni Maria Farinella,et al.  A Benchmark Dataset to Study the Representation of Food Images , 2014, ECCV Workshops.

[6]  Joshua B. Tenenbaum,et al.  Separating Style and Content , 1996, NIPS.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Mei Chen,et al.  Food recognition using statistics of pairwise local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Keiji Yanai,et al.  FoodCam: A Real-Time Mobile Food Recognition System Employing Fisher Vector , 2014, MMM.

[11]  Lei Yang,et al.  PFID: Pittsburgh fast-food image dataset , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[12]  Keiji Yanai,et al.  Image Recognition of 85 Food Categories by Feature Fusion , 2010, 2010 IEEE International Symposium on Multimedia.

[13]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[14]  Keiji Yanai,et al.  Recognition of Multiple-Food Images by Detecting Candidate Regions , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[15]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jindong Tan,et al.  DietCam: Automatic dietary assessment with mobile camera phones , 2012, Pervasive Mob. Comput..

[18]  Vinod Vokkarane,et al.  DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment , 2016, ICOST.

[19]  Charless C. Fowlkes,et al.  Bilinear classifiers for visual recognition , 2009, NIPS.

[20]  Chong-Wah Ngo,et al.  Deep-based Ingredient Recognition for Cooking Recipe Retrieval , 2016, ACM Multimedia.

[21]  Monica Mordonini,et al.  Food Image Recognition Using Very Deep Convolutional Networks , 2016, MADiMa @ ACM Multimedia.

[22]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[23]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[24]  Keiji Yanai,et al.  Food image recognition with deep convolutional features , 2014, UbiComp Adjunct.

[25]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Keiji Yanai,et al.  Food image recognition using deep convolutional network with pre-training and fine-tuning , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[28]  David S. Ebert,et al.  The Use of Mobile Devices in Aiding Dietary Assessment and Evaluation , 2010, IEEE Journal of Selected Topics in Signal Processing.

[29]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.