AdaFlock: Adaptive Feature Discovery for Human-in-the-loop Predictive Modeling

Feature engineering is the key to successful application of machine learning algorithms to real-world data. The discovery of informative features often requires domain knowledge or human inspiration, and data scientists expend a certain amount of effort into exploring feature spaces. Crowdsourcing is considered a promising approach for allowing many people to be involved in feature engineering; however, there is a demand for a sophisticated strategy that enables us to acquire good features at a reasonable crowdsourcing cost. In this paper, we present a novel algorithm called AdaFlock to efficiently obtain informative features through crowdsourcing. AdaFlock is inspired by AdaBoost, which iteratively trains classifiers by increasing the weights of samples misclassified by previous classifiers. AdaFlock iteratively generates informative features; at each iteration of AdaFlock, crowdsourcing workers are shown samples selected according to the classification errors of the current classifiers and are asked to generate new features that are helpful for correctly classifying the given examples. The results of our experiments conducted using real datasets indicate that AdaFlock successfully discovers informative features with fewer iterations and achieves high classification accuracy.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Joseph K. Bradley,et al.  FilterBoost: Regression and Classification on Large Datasets , 2007, NIPS.

[3]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Adam Tauman Kalai,et al.  Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons , 2015, HCOMP.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Michael S. Bernstein,et al.  Flock: Hybrid Crowd-Machine Learning Classifiers , 2015, CSCW.

[8]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[9]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Pradeep Ravikumar,et al.  Human Boosting , 2013, ICML.

[12]  Albert Ali Salah,et al.  Are You Really Smiling at Me? Spontaneous versus Posed Enjoyment Smiles , 2012, ECCV.