Random projections: Data perturbation for classification problems

Random projections offer an appealing and flexible approach to a wide range of large-scale statistical problems. They are particularly useful in high-dimensional settings, where we have many covariates recorded for each observation. In classification problems there are two general techniques using random projections. The first involves many projections in an ensemble -- the idea here is to aggregate the results after applying different random projections, with the aim of achieving superior statistical accuracy. The second class of methods include hashing and sketching techniques, which are straightforward ways to reduce the complexity of a problem, perhaps therefore with a huge computational saving, while approximately preserving the statistical efficiency.

[1]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[2]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[3]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[4]  David B. Dunson,et al.  Targeted Random Projection for Prediction From High-Dimensional Features , 2017, Journal of the American Statistical Association.

[5]  Roberta Falcone,et al.  Matrix sketching for supervised classification with imbalanced classes , 2019, Data Min. Knowl. Discov..

[6]  Brian McWilliams,et al.  DUAL-LOCO: Distributing Statistical Estimation Using Random Projections , 2015, AISTATS.

[7]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[8]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[9]  Martin J. Wainwright,et al.  High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency , 2010, J. Mach. Learn. Res..

[10]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[11]  M. Chavance [Jackknife and bootstrap]. , 1992, Revue d'epidemiologie et de sante publique.

[12]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[13]  Rajen Dinesh Shah,et al.  The xyz algorithm for fast interaction search in high-dimensional data , 2016, J. Mach. Learn. Res..

[14]  Nina Mishra,et al.  Privacy via the Johnson-Lindenstrauss Transform , 2012, J. Priv. Confidentiality.

[15]  Martin Slawski,et al.  On Principal Components Regression, Random Projections, and Column Subsampling , 2017, 1709.08104.

[16]  Gaël Varoquaux,et al.  On the consistency of supervised learning with missing values , 2019, ArXiv.

[17]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[18]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[19]  Stefan Schaal,et al.  A Library for Locally Weighted Projection Regression , 2008, J. Mach. Learn. Res..

[20]  J. Marron Optimal Rates of Convergence to Bayes Risk in Nonparametric Discrimination , 1983 .

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  J. Friedman Regularized Discriminant Analysis , 1989 .

[23]  Ata Kabán A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections , 2015, ACML.

[24]  Emmanuel J. Candes,et al.  Robust inference with knockoffs , 2018, The Annals of Statistics.

[25]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[26]  Ping Li,et al.  Theory and applications of b-bit minwise hashing , 2011, Commun. ACM.

[27]  Berthold Lausen,et al.  An Ensemble of Optimal Trees for Class Membership Probability Estimation , 2016, ECDA.

[28]  Ata Kabán,et al.  Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions , 2015, Machine Learning.

[29]  Yang Feng,et al.  A road to classification in high dimensional space: the regularized optimal affine discriminant , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[30]  E. Skubalska-Rafajlowicz Stability of Random-Projection Based Classifiers. The Bayes Error Perspective , 2019, Springer Proceedings in Mathematics & Statistics.

[31]  David P. Woodruff,et al.  Faster Kernel Ridge Regression Using Sketching and Preconditioning , 2016, SIAM J. Matrix Anal. Appl..

[32]  William J. Astle,et al.  Statistical properties of sketching algorithms , 2017, Biometrika.

[33]  Edgar Dobriban,et al.  Asymptotics for Sketching in Least Squares Regression , 2018, NeurIPS.

[34]  Yingying Fan,et al.  Classification with imperfect training labels , 2018, Biometrika.

[35]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36]  Berthold Lausen,et al.  Ensemble of a subset of kNN classifiers , 2018, Adv. Data Anal. Classif..

[37]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[38]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[39]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[40]  Rajen Dinesh Shah,et al.  Random intersection trees , 2013, J. Mach. Learn. Res..

[41]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[42]  Zhengdao Wang,et al.  Ensemble classification based on Random linear base classifiers , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[44]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[45]  Helmut Bölcskei,et al.  Dimensionality-reduced subspace clustering , 2015, ArXiv.

[46]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[47]  A. Robert Calderbank,et al.  Asymptotic Performance of Linear Discriminant Analysis with Random Projections , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Ata Kabán On Compressive Ensemble Induced Regularisation: How Close is the Finite Ensemble Precision Matrix to the Infinite Ensemble? , 2017, ALT.

[49]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[50]  Ata Kabán,et al.  Random projections versus random selection of features for classification of high dimensional data , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[51]  Ata Kabán,et al.  Sharp Generalization Error Bounds for Randomly-projected Classifiers , 2013, ICML.

[52]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[53]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[54]  Christos Boutsidis,et al.  Random Projections for Support Vector Machines , 2012, AISTATS.

[55]  Yadong Wang,et al.  Comparison among dimensionality reduction techniques based on Random Projection for cancer classification , 2016, Comput. Biol. Chem..

[56]  Tingting Mu,et al.  Modular Dimensionality Reduction , 2018, ECML/PKDD.

[57]  R. Samworth,et al.  Random‐projection ensemble classification , 2015, 1504.04595.

[58]  Ata Kabán,et al.  Dimension-Free Error Bounds from Random Projections , 2019, AAAI.

[59]  Kasper Green Larsen,et al.  The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction , 2014, ICALP.

[60]  Rajen Dinesh Shah,et al.  Min-wise hashing for large-scale regression and classication with sparse data , 2013, 1308.1269.

[61]  Brian McWilliams,et al.  LOCO: Distributing Ridge Regression with Random Projections , 2014, 1406.3469.

[62]  Piotr Fryzlewicz,et al.  Random Rotation Ensembles , 2016, J. Mach. Learn. Res..

[63]  Benoît Frénay,et al.  A comprehensive introduction to label noise , 2014, ESANN.

[64]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[65]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[66]  E. Candès,et al.  A knockoff filter for high-dimensional selective inference , 2016, The Annals of Statistics.

[67]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[69]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[70]  Luc Devroye,et al.  On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification , 2010, J. Multivar. Anal..

[71]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[72]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[73]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[74]  K Lehnertz,et al.  Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[75]  Miles E. Lopes Estimating the algorithmic variance of randomized ensembles via the bootstrap , 2019, The Annals of Statistics.

[76]  Nicolai Meinshausen,et al.  Random Projections for Large-Scale Regression , 2017, 1701.05325.

[77]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[78]  Martin J. Wainwright,et al.  Randomized sketches for kernels: Fast and optimal non-parametric regression , 2015, ArXiv.

[79]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[80]  Angela Montanari,et al.  High-Dimensional Clustering via Random Projections , 2019, Journal of Classification.

[81]  Francis R. Bach,et al.  Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..

[82]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[83]  Lior Rokach,et al.  Random Projection Ensemble Classifiers , 2009, ICEIS.

[84]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[85]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[86]  Miles E. Lopes Estimating a sharp convergence bound for randomized ensembles , 2013 .

[87]  Ata Kabán,et al.  Compressed fisher linear discriminant analysis: classification of randomly projected data , 2010, KDD.

[88]  A. Kabán,et al.  Structure-aware error bounds for linear classification with the zero-one loss , 2017, 1709.09782.

[89]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[90]  Peter J. Bickel,et al.  Projection pursuit in high dimensions , 2018, Proceedings of the National Academy of Sciences.

[91]  P. Hall,et al.  Properties of bagged nearest neighbour classifiers , 2005 .

[92]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[93]  Jalaj Upadhyay,et al.  Random Projections, Graph Sparsification, and Differential Privacy , 2013, ASIACRYPT.

[94]  Tengyu Ma,et al.  Gradient Descent Learns Linear Dynamical Systems , 2016, J. Mach. Learn. Res..

[95]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[96]  R. Samworth Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.

[97]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[98]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[99]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[100]  Ata Kabán,et al.  A tight bound on the performance of Fisher's linear discriminant in randomly projected data spaces , 2012, Pattern Recognit. Lett..

[101]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[102]  Ata Kabán,et al.  Large-Scale Estimation of Distribution Algorithms with Adaptive Heavy Tailed Random Projection Ensembles , 2019, Journal of Computer Science and Technology.

[103]  Martin J. Wainwright,et al.  A More Powerful Two-Sample Test in High Dimensions using Random Projection , 2011, NIPS.

[104]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[105]  Akinori Okada,et al.  Editorial for Special Issue on Analysis of Asymmetric Relationships , 2018, Adv. Data Anal. Classif..

[106]  R. H. Moore,et al.  Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.

[107]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[108]  Arnaud Guyader,et al.  On the Rate of Convergence of the Bagged Nearest Neighbor Estimate , 2010, J. Mach. Learn. Res..

[109]  Rajen Dinesh Shah,et al.  Variable selection with error control: another look at stability selection , 2011, 1105.5578.

[110]  Thomas L. Marzetta,et al.  A Random Matrix-Theoretic Approach to Handling Singular Covariance Estimates , 2011, IEEE Transactions on Information Theory.

[111]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[112]  Chengchun Shi,et al.  A Sparse Random Projection-Based Test for Overall Qualitative Treatment Effects , 2020, Journal of the American Statistical Association.

[113]  Gavin Brown,et al.  Minimax rates for cost-sensitive learning on manifolds with approximate nearest neighbours , 2017, ALT.

[114]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[115]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[116]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[117]  Lars T. Westlye,et al.  Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach , 2016, Front. Genet..

[118]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[119]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[120]  Tengyao Wang,et al.  Sparse principal component analysis via axis‐aligned random projections , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[121]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[122]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[123]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[124]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.