Challenges in Statistical Machine Learning

A surge of research in machine learning during the past decade has led to powerful learning methods that are successfully being applied to a wide range of application domains, from search engines to computational biology and robotics. These advances have in part been achieved by refining the art and engineering practice of machine learning, paralleled by a confluence of machine learning and statistics. But an understanding of the scientific foundations and fundamental limits to learning from data can also be effectively leveraged in practice. In this overview of recent work we present some of the current technical challenges in the field of machine learning, focusing on high dimensional data and minimax rates of convergence. These challenges include understanding the role of sparsity in statistical learning, semi-supervised learning, the tradeoff between computation and risk, and structured prediction problems.

[1]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[2]  B. Peter BOOSTING FOR HIGH-DIMENSIONAL LINEAR MODELS , 2006 .

[3]  Manfred K. Warmuth,et al.  The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[6]  V. Spokoiny,et al.  Optimal pointwise adaptive methods in nonparametric estimation , 1997 .

[7]  VekslerOlga,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Manfred K. Warmuth,et al.  The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[10]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[11]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[12]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[13]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[15]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[17]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[18]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[19]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[20]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[21]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[22]  M. Wand,et al.  Multivariate Locally Weighted Least Squares Regression , 1994 .

[23]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2005, 0708.2321.

[24]  Robert D. Nowak,et al.  Minimax-optimal classification with dyadic decision trees , 2006, IEEE Transactions on Information Theory.

[25]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1993, JACM.

[26]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[27]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[28]  Ryan O'Donnell,et al.  Learning functions of k relevant variables , 2004, J. Comput. Syst. Sci..

[29]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[30]  G. Kerkyacharian,et al.  Nonlinear estimation in anisotropic multi-index denoising , 2001 .

[31]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[32]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[33]  Eyal Kushilevitz,et al.  Learning Decision Trees Using the Fourier Spectrum , 1993, SIAM J. Comput..

[34]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[35]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[36]  E. Mammen,et al.  Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors , 1997 .

[37]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[38]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[39]  Meta M. Voelker,et al.  Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[40]  Thomas Hofmann,et al.  Gaussian process classification for segmenting and annotating sequences , 2004, ICML.

[41]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[42]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[43]  P. Bühlmann Boosting for high-dimensional linear models , 2006 .

[44]  V. Koltchinskii Rejoinder: Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0135.

[45]  Mark Braverman,et al.  Learnability and automatizability , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[46]  Gilles Blanchard,et al.  Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii , 2007, 0708.0094.

[47]  Larry A. Wasserman,et al.  Rodeo: Sparse Nonparametric Regression in High Dimensions , 2005, NIPS.

[48]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.