BTBUFood-60: Dataset for Object Detection in Food Field

With the development of computer vision and image processing, researchers have published abundant image datasets for object detection. Whereas, we can hardly acquire food dataset dedicated for food object detection. To address this problem, we introduce a novel dataset that includes images of 60 objects categories which are common in food domain. With a total of 78k labeled instances in 60k images, the creation of our dataset drew upon 30 graduate students in our lab involvement via our multi-label automatic labeling system for image classification and object detection. We present a detailed statistical analysis of our dataset compared to PASCAL, Food-101, and ImageNet. We lastly provide baseline performance analysis for bounding-box detection results using a Faster Region based Convolutional Neural Networks.

[1]  Lei Wang,et al.  Deep convolutional representations and kernel extreme learning machines for image classification , 2018, Multimedia Tools and Applications.

[2]  Ming Ouhyoung,et al.  Automatic Chinese food identification and quantity estimation , 2012, SIGGRAPH Asia Technical Briefs.

[3]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[4]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Gang Hua,et al.  Labeled Faces in the Wild: A Survey , 2016 .

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  Keiji Yanai,et al.  Real-Time Mobile Food Recognition System , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[14]  Lei Yang,et al.  PFID: Pittsburgh fast-food image dataset , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[15]  Keiji Yanai,et al.  A food image recognition system with Multiple Kernel Learning , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[16]  Krzysztof Z. Gajos,et al.  Platemate: crowdsourcing nutritional analysis from food photographs , 2011, UIST.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Keiji Yanai,et al.  Multiple-food recognition considering co-occurrence employing manifold ranking , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[19]  Corby K. Martin,et al.  Validity of the Remote Food Photography Method (RFPM) for Estimating Energy and Nutrient Intake in Near Real‐Time , 2012, Obesity.

[20]  Yejin Choi,et al.  From Large Scale Image Categorization to Entry-Level Categories , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Xin Jin,et al.  Context-aware local abnormality detection in crowded scene , 2015, Science China Information Sciences.

[22]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Changsheng Li,et al.  Sparse representation for robust abnormality detection in crowded scenes , 2014, Pattern Recognit..