Robust learning from bites for data mining

Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets.

[1]  Xuming He,et al.  ASYMPTOTIC DISTRIBUTIONS OF THE MAXIMAL DEPTH ESTIMATORS FOR REGRESSION AND MULITVARIATE LOCATION , 2008 .

[2]  Peter J. Rousseeuw,et al.  The Remedian: A Robust Averaging Method for Large Data Sets , 1990 .

[3]  David Madigan,et al.  Algorithms for Sparse Linear Classifiers in the Massive Data Setting , 2008 .

[4]  Andreas Christmann,et al.  On Properties of Support Vector Machines for Pattern Recognition in Finite Samples , 2004 .

[5]  L. Breiman Pasting Bites Together For Prediction In Large Data Sets And On-Line , 1996 .

[6]  Thorsten Joachims,et al.  Comparison between various regression depth methods and the support vector machine to approximate the minimum number of missclassifications , 2002, Comput. Stat..

[7]  Hengjian Cui,et al.  Depth weighted scatter estimators , 2005 .

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Martin Schader,et al.  Data Analysis: Scientific Modeling And Practical Application , 2000 .

[10]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[11]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[12]  S. Sathiya Keerthi,et al.  A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[13]  Ulrich Güntzer,et al.  Data Quality Mining - Making a Virute of Necessity , 2001, DMKD.

[14]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[15]  Stefan Van Aelst,et al.  Fast and robust bootstrap for LTS , 2005, Comput. Stat. Data Anal..

[16]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[17]  Andreas Christmann On a Strategy to Develop Robust and Simple Tariffs from Motor Vehicle Insurance Data , 2004 .

[18]  Peter J. Rousseeuw,et al.  An Algorithm for Positive-Breakdown Regression Based on Concentration Steps , 2000 .

[19]  Conceição Amado,et al.  Robust Bootstrap with Non Random Weights Based on the Influence Function , 2004 .

[20]  Regina Y. Liu,et al.  Regression depth. Commentaries. Rejoinder , 1999 .

[21]  Andreas Christmann,et al.  On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition , 2004, J. Mach. Learn. Res..

[22]  J. Berkson MINIMUM CHI-SQUARE, NOT MAXIMUM LIKELIHOOD! , 1980 .

[23]  Ingo Steinwart,et al.  Support Vector Machines are Universally Consistent , 2002, J. Complex..

[24]  Nitesh V. Chawla,et al.  Learning Ensembles from Bites: A Scalable and Accurate Approach , 2004, J. Mach. Learn. Res..

[25]  Ingo Steinwart,et al.  Consistency and robustness of kernel based regression , 2005 .

[26]  S. Van Aelst,et al.  Principal Components Analysis Based on Multivariate MM Estimators With Fast and Robust Bootstrap , 2006 .

[27]  Herbert A. David,et al.  Order Statistics, Third Edition , 2003, Wiley Series in Probability and Statistics.

[28]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[29]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[30]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[31]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[32]  P. L. Davies Aspects of Robust Linear Regression , 1993 .

[33]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[34]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[35]  P. L. Davies,et al.  The asymptotics of S-estimators in the linear regression model , 1990 .

[36]  C. Croux,et al.  Bounded influence regression using high breakdown scatter matrices , 2003 .

[37]  Leo Breiman,et al.  Pasting Small Votes for Classification in Large Databases and On-Line , 1999, Machine Learning.

[38]  Z. Bai,et al.  Asymptotic distributions of the maximal depth estimators for regression and multivariate location , 1999 .

[39]  M. Hubert,et al.  The Deepest Regression Method , 2002 .

[40]  F. Hampel Contributions to the theory of robust estimation , 1968 .

[41]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[42]  D. Pollard,et al.  Cube Root Asymptotics , 1990 .

[43]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[44]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[45]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[46]  Ambuj Tewari,et al.  Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results , 2007, J. Mach. Learn. Res..

[48]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[49]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[50]  G. Wahba Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV 1 , 1998 .

[51]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[52]  J. Friedman Stochastic gradient boosting , 2002 .

[53]  Don R. Hush,et al.  Function Classes That Approximate the Bayes Risk , 2006, COLT.

[54]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[55]  C. W. Coakley,et al.  A Bounded Influence, High Breakdown, Efficient Regression Estimator , 1993 .