论文信息 - Plant identification with noisy web data

Plant identification with noisy web data

One of the main problems in image based plant identification has been the lack of quality training image data. A few attempts for solving this problem through generating high quality plant images from crowd sourced Web image collections like Flickr are proposed in this paper. These methods try to automatically identify correct and informative training images from those Web images, which typically have very noisy metadata (for example, user tags in Flickr), to enhance existing manually labeled training set. Firstly, for each plant, a set of images is collected from searching Flickr by using the plant name as the query. Then, images are clustered into visually consistent clusters, and in each cluster hopefully a majority of the images are all relevant or irrelevant to the particular plant. From these clusters, a managed plant image data set from ImageCLEF is used as reference to automatically select the highest quality cluster for each plant. The image quality of the selected clusters is further improved by two algorithms: an iterative method and image similarity based ranking. We show that the larger training data set automatically selected by this method significantly increases the accuracy of image based plant identification. In addition, this approach is a generic solution to almost all image recognition problems as long as additional (noisy) training data can be obtained from the Internet automatically.

Xian-Sheng Hua | William Y. Zhang

[1] Charles Elkan,et al. Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[2] Hervé Glotin,et al. Participation of LSIS/DYNI to ImageCLEF 2012 Plant Images Classification Task , 2012, CLEF.

[3] Gang Wang,et al. Learning Image Similarity from Flickr Groups Using Fast Kernel Machines , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[5] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[6] Nozha Boujemaa,et al. The ImageCLEF 2012 Plant Identification Task , 2012, CLEF.

[7] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.