Single-View Food Portion Estimation: Learning Image-to-Energy Mappings Using Generative Adversarial Networks

Due to the growing concern of chronic diseases and other health problems related to diet, there is a need to develop accurate methods to estimate an individual's food and energy intake. Measuring accurate dietary intake is an open research problem. In particular, accurate food portion estimation is challenging since the process of food preparation and consumption impose large variations on food shapes and appearances. In this paper, we present a food portion estimation method to estimate food energy (kilocalories) from food images using Generative Adversarial Networks (GAN). We introduce the concept of an “energy distribution” for each food image. To train the GAN, we design a food image dataset based on ground truth food labels and segmentation masks for each food image as well as energy information associated with the food image. Our goal is to learn the mapping of the food image to the food energy. We can then estimate food energy based on the energy distribution. We show that an average energy estimation error rate of 10.89% can be obtained by learning the image-to-energy mapping.

[1]  E. Delp,et al.  Novel Technologies for Assessing Dietary Intake: Evaluating the Usability of a Mobile Telephone Food Record Among Adults and Adolescents , 2012, Journal of medical Internet research.

[2]  Zhiwei Zhu,et al.  Recognition and volume estimation of food intake using a mobile device , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[3]  Edward J. Delp,et al.  The use of co-occurrence patterns in single image based food portion estimation , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[4]  E. Delp,et al.  Evidence-based development of a mobile telephone food record. , 2010, Journal of the American Dietetic Association.

[5]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[10]  Keiji Yanai,et al.  Image-Based Food Calorie Estimation Using Knowledge on Food Categories, Ingredients and Cooking Directions , 2017, ACM Multimedia.

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[14]  Mingui Sun,et al.  A wearable electronic system for objective dietary assessment. , 2010, Journal of the American Dietetic Association.

[15]  Edward J. Delp,et al.  Multiple Hypotheses Image Segmentation and Classification With Application to Dietary Assessment , 2015, IEEE Journal of Biomedical and Health Informatics.

[16]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Keiji Yanai,et al.  Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation , 2014, ECCV Workshops.

[18]  Radim Sára,et al.  Spatial Pattern Templates for Recognition of Objects with Regular Structure , 2013, GCPR.

[19]  Xing Zhang,et al.  A mobile structured light system for food volume estimation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20]  Gang Wang,et al.  Multi-Task CNN Model for Attribute Prediction , 2015, IEEE Transactions on Multimedia.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ming Ouhyoung,et al.  Automatic Chinese food identification and quantity estimation , 2012, SIGGRAPH Asia Technical Briefs.

[24]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[25]  Edward J. Delp,et al.  A comparison of food portion size estimation using geometric models and depth images , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[26]  E. Delp,et al.  Merging dietary assessment with the adolescent lifestyle. , 2014, Journal of human nutrition and dietetics : the official journal of the British Dietetic Association.

[27]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[28]  Behjat Siddiquie,et al.  “Snap-n-Eat” , 2015, Journal of diabetes science and technology.

[29]  Stavroula G. Mougiakakou,et al.  Food volume computation for self dietary assessment applications , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[30]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Mingui Sun,et al.  3D/2D model-to-image registration for quantitative dietary assessment , 2012, 2012 38th Annual Northeast Bioengineering Conference (NEBEC).

[32]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[33]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[34]  Kiyoharu Aizawa,et al.  FoodLog: capture, analysis and retrieval of personal food images via web , 2009, CEA '09.

[35]  Kiyoharu Aizawa,et al.  Food Balance Estimation by Using Personal Dietary Tendencies in a Multimedia Food Log , 2013, IEEE Transactions on Multimedia.

[36]  Lei Yang,et al.  PFID: Pittsburgh fast-food image dataset , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[37]  Keiji Yanai,et al.  A food image recognition system with Multiple Kernel Learning , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Shervin Shirmohammadi,et al.  Measuring Calorie and Nutrition From Food Image , 2014, IEEE Transactions on Instrumentation and Measurement.

[40]  David S. Ebert,et al.  The Use of Mobile Devices in Aiding Dietary Assessment and Evaluation , 2010, IEEE Journal of Selected Topics in Signal Processing.

[41]  Sergio Guadarrama,et al.  Im2Calories: Towards an Automated Mobile Vision Food Diary , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Edward J. Delp,et al.  Single-View Food Portion Estimation Based on Geometric Models , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jindong Tan,et al.  DietCam: Automatic dietary assessment with mobile camera phones , 2012, Pervasive Mob. Comput..

[45]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.