Bootstrapping Fine-Grained Classifiers : Active Learning with a Crowd in the Loop

We propose an iterative crowd-enabled active learning algorithm for building high-precision visual classifiers from unlabeled images. Our method employs domain experts to identify a small number of examples of a specific visual event. These expert-labeled examples seed a classifier, which is then iteratively trained by active querying of a non-expert crowd. These non-experts actively refine the classifiers at every iteration by answering simple binary questions about the classifiers’ detections. The advantage of this approach is that experts efficiently shepherd an unsophisticated crowd into training a classifier capable of fine-grained distinctions. This obviates the need to label an entire dataset to obtain high-precision classifiers. We find these classifiers are advantageous for creating a large vocabulary of visual attributes for specialized taxonomies. We demonstrate our crowd active learning pipeline by creating classifiers for attributes related to North American birds and fashion.

[1]  Nello Cristianini,et al.  Neural Information Processing Systems (NIPS) , 2003 .

[2]  Yoav Freund,et al.  Active learning for visual object recognition , 2005 .

[3]  Fei-Fei Li,et al.  Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Sudheendra Vijayanarasimhan,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[7]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[10]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[11]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[12]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[13]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[14]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Scott R. Klemmer,et al.  Shepherding the crowd yields better work , 2012, CSCW.

[16]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[18]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[19]  Subhransu Maji,et al.  Part Discovery from Partial Correspondence , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Peter N. Belhumeur,et al.  POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.