Fusion Learning using Semantics and Graph Convolutional Network for Visual Food Recognition

Food-related applications and services are essential for the health and well-being of people. With the rapid development of social networks and mobile devices, food images captured by people can offer rich knowledge about the food and also necessary dietary assistance for people that require special care. Known food recognition frameworks and approaches in computer vision have heavy reliance on many-shot training of a deep network on existing large-scale food datasets. However, it is common for many food categories that it is difficult to collect enough images for training. Traditional few-shot learning is unable to properly address the problem due to the complex characteristics and large variations of food images, and most few-shot frame-works cannot perform classification for many-shot and few-shot categories at the same time. In this paper, we propose a new fusion learning framework for food recognition. It unifies many-shot and few-shot under a single framework, by leveraging on extracted image representations and context sensitive semantic embeddings. Further, considering food categories are often correlated to each other for many commonalities such as same ingredients, cooking methods, the fusion learning framework utilizes a Graph Convolutional Network (GCN) to capture the inter-class relations between both image representations and semantic embeddings of different food categories. The final output fusion classifier will be more robust and discriminative. Comprehensive experimental results on two popular food benchmarks have shown the proposed framework achieves the state-of-the-art fusion performance.

[1]  Zhiwu Lu,et al.  Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Keiji Yanai,et al.  DeepFoodCam: A DCNN-based Real-time Mobile Food Recognition System , 2016, MADiMa @ ACM Multimedia.

[3]  Nikos Komodakis,et al.  Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Keiji Yanai,et al.  FoodCam-256: A Large-scale Real-time Mobile Food RecognitionSystem employing High-Dimensional Features and Compression of Classifier Weights , 2014, ACM Multimedia.

[5]  Shervin Shirmohammadi,et al.  Mobile Multi-Food Recognition Using Deep Learning , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[6]  Zhen Li,et al.  Graph-RISE: Graph-Regularized Image Semantic Embedding , 2019, ArXiv.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Shuang Wang,et al.  Geolocalized Modeling for Dish Recognition , 2015, IEEE Transactions on Multimedia.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ramesh C. Jain,et al.  A Survey on Food Computing , 2018, ACM Comput. Surv..

[11]  Monica Mordonini,et al.  Food Image Recognition Using Very Deep Convolutional Networks , 2016, MADiMa @ ACM Multimedia.

[12]  Edward J. Delp,et al.  Analysis of food images: Features and classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[13]  Neel Joshi,et al.  Menu-Match: Restaurant-Specific Food Logging from Images , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[14]  Jing Zhang,et al.  Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[16]  Petia Radeva,et al.  Food Recognition Using Fusion of Classifiers Based on CNNs , 2017, ICIAP.

[17]  Feng Zhou,et al.  Fine-Grained Image Classification by Exploring Bipartite-Graph Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Raja Giryes,et al.  Baby steps towards few-shot learning with multiple semantics , 2019, Pattern Recognit. Lett..

[19]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[20]  Xin Wang,et al.  TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[22]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[23]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Gian Luca Foresti,et al.  Wide-Slice Residual Networks for Food Recognition , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[28]  Luis Herranz,et al.  Being a Supercook: Joint Food Attributes and Multimodal Content Modeling for Recipe Retrieval and Exploration , 2017, IEEE Transactions on Multimedia.

[29]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[30]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[32]  Shuqiang Jiang,et al.  Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition , 2020, IEEE Transactions on Image Processing.

[33]  Yannis Avrithis,et al.  Dense Classification and Implanting for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[35]  Steven C. H. Hoi,et al.  Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Wanqing Li,et al.  Food image classification using local appearance and global structural information , 2014, Neurocomputing.

[37]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[38]  Sergio Guadarrama,et al.  Im2Calories: Towards an Automated Mobile Vision Food Diary , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Krzysztof Z. Gajos,et al.  Platemate: crowdsourcing nutritional analysis from food photographs , 2011, UIST.

[40]  Marios Anthimopoulos,et al.  A Food Recognition System for Diabetic Patients Based on an Optimized Bag-of-Features Model , 2014, IEEE Journal of Biomedical and Health Informatics.

[41]  Sugato Basu,et al.  HUSE: Hierarchical Universal Semantic Embeddings , 2019, ArXiv.

[42]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[43]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[45]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[46]  Christoph Trattner,et al.  Exploiting Food Choice Biases for Healthier Recipe Recommendation , 2017, SIGIR.

[47]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.