Research on the Fine-grained Plant Image Classification

The similarity between different subcategories and scarce training data due to the difficulties of Fine-grained recognition. Even in the same subcategories, there can be some differences due to the distinct color and pose of objects. We propose some models for fine-grained plant recognition by taking advantage of deep Convolutional Neural Network (CNN) and traditional feature based methods including SIFT [1], Bag of Word (BoW) [2]. We evaluate our method on Oxford 102 Flowers dataset [3], our results show that the CNN method achieves higher accuracy than the traditional feature based methods. Our results demonstrates state-of-the-art performances on the Oxford 102 Flowers with 88.40% (Acc.). Introduction Object recognition is one of the major focuses of research in computer vision. Most of existing recognition tasks are on basic-level: distinguishing between table, human, computer, car and so on. Categories differ greatly from each other on basic-level recognition. On the contrary, fine-grained recognition concentrates on differences between subcategory (breeds, species or product models), for example, recognition of different species of birds or species of flowers, which means similarities existing across categories and subtle differences needed to be found. Scale-invariant feature transform (SIFT) is an algorithm for local features detection and description. SIFT and its variants are frequently used in image matching and image retrieval to extract features. Since Sivic et al. [2] introduced the BoW method from natural language processing to computer vision and achieved great success on many public datasets, including 15-Scenes [4], Caltech-256 [5], PASCAL VOC [6] etc. CNN first was popularized by LeCun [7] to use in digit recognition, but fell out of fashion because of the requirement for strong computing power and large amounts of training data. With the development of parallel computing and the construction of large image databases, CNN goes to front stage again and achieves high success in many computer vision tasks. For instance, Krizhevsky et al. [8] achieved an impressive result using a CNN in ILSVRC2012 [9] with two GPUs to accelerate the computation of CNN parameters. Inspired by Krizhevsky et al., many groups proposed CNN architectures to solve the classification problems. In order to get a better performance, many CNNs ([10] [11] [12]) are first pre-trained on a large image set, ImageNet [9] for example, followed by domain-specific fine-tuning. Girshick et al. [10] proposed a model applied CNN to bottom-up region proposals and generalized the CNN classification results on ImageNet to Pascal VOC. N Zhang et al. [12] fine-tuned the ImageNet pre-trained CNN for the 200-way bird classify using the ground truth bounding box crops of the original images. In recent years, a variety of methods about find-grained classification have been proposed. We divide these methods into two parts. One is traditional feature based methods, usually using some methods to extract hand-made features and then using a classifier for classification. Another is CNN based methods, usually using a deep convolutional neural network to extract features and obtain the classification result automatically. In this paper, we propose our methods in both traditional hand-made features based and CNN. We combine the SIFT and BoW for image classification. Then we use CNN for image classification to compare to the method mentioned before. Our results show that CNN method can achieve higher 4th International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2016) Copyright © 2017, the Authors. Published by Atlantis Press. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/). Advances in Computer Science Research, volume 71

[1]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Spatial pyramid matching , 2009 .

[7]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[8]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[11]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.