Picture-to-Amount (PITA): Predicting Relative Ingredient Amounts from Food Images

Increased awareness of the impact of food consumption on health and lifestyle today has given rise to novel data-driven food analysis systems. Although these systems may recognize the ingredients, a detailed analysis of their amounts in the meal, which is paramount for estimating the correct nutrition, is usually ignored. In this paper, we study the novel and challenging problem of predicting the relative amount of each ingredient from a food image. We propose PITA, the Picture-to-Amount deep learning architecture to solve the problem. More specifically, we predict the ingredient amounts using a domain-driven Wasserstein loss from image-to-recipe cross-modal embeddings learned to align the two views of food data. Experiments on a dataset of recipes collected from the Internet show the model generates promising results and improves the baselines on this challenging task. A demo of our system and our data is availableat: this http URL.

[1]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[2]  Shuqiang Jiang,et al.  Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition , 2019, ACM Multimedia.

[3]  Amaia Salvador,et al.  Inverse Cooking: Recipe Generation From Food Images , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Marios Anthimopoulos,et al.  Two-View 3D Reconstruction for Food Volume Estimation , 2017, IEEE Transactions on Multimedia.

[5]  Paolo Napoletano,et al.  Food Recognition: A New Dataset, Experiments, and Results , 2017, IEEE Journal of Biomedical and Health Informatics.

[6]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[7]  Wataru Shimoda,et al.  Image-Based Estimation of Real Food Size for Accurate Food Calorie Estimation , 2019, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[8]  Steven C. H. Hoi,et al.  Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jianhua Li,et al.  Computer vision-based food calorie estimation: dataset, method, and experiment , 2017, ArXiv.

[10]  Vladimir Pavlovic,et al.  Deep Cooking: Predicting Relative Food Ingredient Amounts from Images , 2019, MADiMa @ ACM Multimedia.

[11]  Neel Joshi,et al.  Menu-Match: Restaurant-Specific Food Logging from Images , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[12]  Chong-Wah Ngo,et al.  Cross-Modal Recipe Retrieval: How to Cook this Dish? , 2017, MMM.

[13]  Touradj Ebrahimi,et al.  Food/Non-food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model , 2016, MADiMa @ ACM Multimedia.

[14]  Edward J. Delp,et al.  Single-View Food Portion Estimation: Learning Image-to-Energy Mappings Using Generative Adversarial Networks , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[15]  Siyao Wang,et al.  Mining Discriminative Food Regions for Accurate Food Recognition , 2019, BMVC.

[16]  Xin Zheng,et al.  Multi-view Model Contour Matching Based Food Volume Estimation , 2018 .

[17]  Matthieu Cord,et al.  Recipe recognition with large multimodal food dataset , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[18]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[19]  Matthieu Cord,et al.  Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings , 2018, SIGIR.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Sergio Guadarrama,et al.  Im2Calories: Towards an Automated Mobile Vision Food Diary , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  John R. Smith,et al.  Snap, Eat, RepEat: A Food Recognition Engine for Dietary Logging , 2016, MADiMa @ ACM Multimedia.

[23]  Vladimir Pavlovic,et al.  The Art of Food: Meal Image Synthesis from Ingredients , 2019, ArXiv.

[24]  Nassir Navab,et al.  Relative affine structure: theory and application to 3D reconstruction from perspective views , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Gian Luca Foresti,et al.  Wide-Slice Residual Networks for Food Recognition , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Amaia Salvador,et al.  Learning Cross-Modal Embeddings for Cooking Recipes and Food Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xin Chen,et al.  ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition , 2017, ArXiv.

[29]  B. Koroušić Seljak,et al.  NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment , 2017, Nutrients.

[30]  Makoto Ogawa,et al.  Food Detection and Recognition Using Convolutional Neural Network , 2014, ACM Multimedia.

[31]  Chong-Wah Ngo,et al.  Deep-based Ingredient Recognition for Cooking Recipe Retrieval , 2016, ACM Multimedia.

[32]  Chong-Wah Ngo,et al.  Deep Understanding of Cooking Procedure for Cross-modal Recipe Retrieval , 2018, ACM Multimedia.

[33]  G. MacGregor,et al.  A comprehensive review on salt and health and current experience of worldwide salt reduction programmes , 2009, Journal of Human Hypertension.