The Analysis of Adaptive Data Collection Methods for Machine Learning

Over the last decade the machine learning community has watched the size and complexity of datasets grow at an exponential rate, with some describing the phenomenon as big data. There are two main bottlenecks for the performance of machine learning methods: computational resources and the amount of labelled data, often provided by a human expert. Advances in distributed computing and the advent of cloud computing platforms has turned computational resources into a commodity and the price has predictably dropped precipitously. But the human response time has remained constant: the time it will take a human to answer a question tomorrow is the same amount of time it takes today, but tomorrow it will cost more due to rising wages world-wide. This thesis proposes a simple solution: require fewer labels by asking better questions. One way to ask better questions is to make the data collection procedure adaptive so that the question that is asked next depends on all the information gathered up to the current time. Popular examples of adaptive data collection procedures include the 20 questions game or simply the binary search algorithm. We will investigate several examples of adaptive data collection methods and for each we will be interested in answering questions like, how many queries are sufficient for a particular algorithm to achieve a desired prediction error? How many queries must any algorithm necessarily ask to achieve a desired prediction error? What are the fundamental quantities that characterize the difficulty of a particular problem? This thesis focuses on scenarios where the answers to queries are provided by a human. Humans are much more comfortable offering qualitative statements in practice like “this

[1]  Peter L. Bartlett,et al.  Oracle inequalities for computationally adaptive model selection , 2012, ArXiv.

[2]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[3]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  R. Oeuvray,et al.  A New Derivative-Free Algorithm for the Medical Image Registration Problem , 2007 .

[6]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[7]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[8]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[9]  Warren B. Powell,et al.  Optimal Learning: Powell/Optimal , 2012 .

[10]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[11]  Michel Wedel,et al.  The effects of alternative methods of collecting similarity data for Multidimensional Scaling , 1995 .

[12]  R. Graham,et al.  Spearman's Footrule as a Measure of Disarray , 1977 .

[13]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[14]  Ameet Talwalkar,et al.  Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[15]  Nir Ailon,et al.  Active Learning Using Smooth Relative Regret Approximations with Applications , 2011, COLT.

[16]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[17]  Joshua B. Tenenbaum,et al.  Sparse multidimensional scaling using land-mark points , 2004 .

[18]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[19]  L. Thurstone A law of comparative judgment. , 1994 .

[20]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[21]  Kevin G. Jamieson,et al.  Active Ranking in Practice: General Ranking Functions with Sample Complexity Bounds , 2011 .

[22]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[23]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[24]  C. Coombs A theory of data. , 1965, Psychology Review.

[25]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[26]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[27]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[28]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[29]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[30]  Robert E. Bechhofer,et al.  A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs , 1958 .

[31]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[32]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[33]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[34]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[35]  R. M. Johnson,et al.  Pairwise nonmetric multidimensional scaling , 1973 .

[36]  R. Shepard Metric structures in ordinal data , 1966 .

[37]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[38]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[39]  Bruce A. Schneider,et al.  Spatial and conjoint models based on pairwise comparisons of dissimilarities and combined effects: Complete and incomplete designs , 1991 .

[40]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[41]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[42]  David J. Kriegman,et al.  Generalized Non-metric Multidimensional Scaling , 2007, AISTATS.

[43]  Nir Ailon,et al.  Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity , 2011, NIPS.

[44]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[45]  Noga Alon,et al.  Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices , 2004, NIPS.

[46]  E. Paulson A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .

[47]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[48]  H. Woxniakowski Information-Based Complexity , 1988 .

[49]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[50]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[51]  Robert D. Nowak,et al.  The Geometry of Generalized Binary Search , 2009, IEEE Transactions on Information Theory.

[52]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[53]  Robert D. Nowak,et al.  Low-dimensional embedding using adaptively selected ordinal data , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[54]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[55]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[56]  V. Protasov Algorithms for approximate calculation of the minimum of a convex function from its values , 1996 .

[57]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[58]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[59]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[60]  Robert D. Nowak,et al.  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[61]  Gordon D. A. Brown,et al.  Absolute identification by relative judgment. , 2005, Psychological review.

[62]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[63]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[64]  Tibor Hegedűs,et al.  Generalized teaching dimensions and the query complexity of learning , 1995, Annual Conference Computational Learning Theory.

[65]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[66]  Wei Chu,et al.  Extensions of Gaussian processes for ranking: semi-supervised and active learning , 2005 .

[67]  H. Robbins,et al.  Iterated logarithm inequalities. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Stephen F. Smith,et al.  The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[69]  Thorsten Joachims,et al.  Beat the Mean Bandit , 2011, ICML.

[70]  Tim Kraska,et al.  TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries , 2015, ArXiv.

[71]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[72]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[73]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[74]  R. H. Farrell Asymptotic Behavior of Expected Sample Size in Certain One Sided Tests , 1964 .

[75]  Tim Kraska,et al.  MLbase: A Distributed Machine-learning System , 2013, CIDR.

[76]  Craig Boutilier,et al.  Robust Approximation and Incremental Elicitation in Voting Protocols , 2011, IJCAI.

[77]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[78]  Olivier Toubia,et al.  Eliciting Consumer Preferences Using Robust Adaptive Choice Questionnaires , 2008, IEEE Transactions on Knowledge and Data Engineering.

[79]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[80]  H. Warren Lower bounds for approximation by nonlinear manifolds , 1968 .

[81]  Matti Kääriäinen,et al.  Active Learning in the Non-realizable Case , 2006, ALT.

[82]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[83]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[84]  Thomas Brendan Murphy,et al.  A Latent Space Model for Rank Data , 2006, SNA@ICML.

[85]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[86]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[87]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[88]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[89]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[90]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[91]  Ya Zhang,et al.  Active Learning for Ranking through Expected Loss Optimization , 2010, IEEE Transactions on Knowledge and Data Engineering.

[92]  Raphaël Féraud,et al.  Generic Exploration and K-armed Voting Bandits , 2013, ICML.

[93]  Thorsten Joachims,et al.  Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[94]  W. Hays,et al.  Multidimensional unfolding: Determining the dimensionality of ranked preference data , 1960 .

[95]  Gert R. G. Lanckriet,et al.  Partial order embedding with multiple kernels , 2009, ICML '09.

[96]  César A. Hidalgo,et al.  The Collaborative Image of The City: Mapping the Inequality of Urban Perception , 2013, PloS one.

[97]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[98]  Robert D. Nowak,et al.  Active Ranking using Pairwise Comparisons , 2011, NIPS.

[99]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[100]  Maxim Raginsky,et al.  Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.

[101]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[102]  Robert D. Nowak,et al.  Sparse Dueling Bandits , 2015, AISTATS.

[103]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[104]  Akshay Balsubramani Sharp Uniform Martingale Concentration Bounds , 2014, ArXiv.

[105]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[106]  L. Atlas,et al.  Perceptual Feature Identification for Active Sonar Echoes , 2006, OCEANS 2006.

[107]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[108]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[109]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[110]  Matthew Malloy,et al.  On Finding the Largest Mean Among Many , 2013, ArXiv.

[111]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[112]  Steve Hanneke,et al.  Theoretical foundations of active learning , 2009 .

[113]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[114]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[115]  Suhas N. Diggavi,et al.  Randomized Algorithms for Comparison-based Search , 2011, NIPS.

[116]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[117]  B. Lang,et al.  Efficient optimization of support vector machine learning parameters for unbalanced datasets , 2006 .

[118]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..