A probabilistic model for food image recognition in restaurants

A large amount of food photos are taken in restaurants for diverse reasons. This dish recognition problem is very challenging, due to different cuisines, cooking styles and the intrinsic difficulty of modeling food from its visual appearance. Contextual knowledge is crucial to improve recognition in such scenario. In particular, geocontext has been widely exploited for outdoor landmark recognition. Similarly, we exploit knowledge about menus and geolocation of restaurants and test images. We first adapt a framework based on discarding unlikely categories located far from the test image. Then we reformulate the problem using a probabilistic model connecting dishes, restaurants and geolocations. We apply that model in three different tasks: dish recognition, restaurant recognition and geolocation refinement. Experiments on a dataset including 187 restaurants and 701 dishes show that combining multiple evidences (visual, geolocation, and external knowledge) can boost the performance in all tasks.

[1]  Bernd Girod,et al.  Mobile Visual Search: Architectures, Technologies, and the Emerging MPEG Standard , 2011, IEEE MultiMedia.

[2]  Wanqing Li,et al.  Food image classification using local appearance and global structural information , 2014, Neurocomputing.

[3]  Zhen Li,et al.  A Comparative Study of Mobile-Based Landmark Recognition Techniques , 2010, IEEE Intelligent Systems.

[4]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[6]  Kiyoharu Aizawa,et al.  Food Balance Estimation by Using Personal Dietary Tendencies in a Multimedia Food Log , 2013, IEEE Transactions on Multimedia.

[7]  Keiji Yanai,et al.  Image Recognition of 85 Food Categories by Feature Fusion , 2010, 2010 IEEE International Symposium on Multimedia.

[8]  Zhen Li,et al.  Content and Context Boosting for Mobile Landmark Recognition , 2012, IEEE Signal Processing Letters.

[9]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Lei Yang,et al.  PFID: Pittsburgh fast-food image dataset , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[11]  Jindong Tan,et al.  DietCam: Regular Shape Food Recognition with a Camera Phone , 2011, 2011 International Conference on Body Sensor Networks.

[12]  Keiji Yanai,et al.  Automatic Expansion of a Food Image Dataset Leveraging Existing Categories with Domain Adaptation , 2014, ECCV Workshops.