论文信息 - Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Dot product is a key building block in a number of data mining algorithms from classification, regression, correlation clustering, to information retrieval and many others. When data is high dimensional, the use of random projections may serve as a universal dimensionality reduction method that provides both low distortion guarantees and computational savings. Yet, contrary to the optimal guarantees that are known on the preservation of the Euclidean distance cf. the Johnson-Lindenstrauss lemma, the existing guarantees on the dot product under random projection are loose and incomplete in the current data mining and machine learning literature. Some recent literature even suggested that the dot product may not be preserved when the angle between the original vectors is obtuse. In this paper we provide improved bounds on the dot product under random projection that matches the optimal bounds on the Euclidean distance. As a corollary, we elucidate the impact of the angle between the original vectors on the relative distortion of the dot product under random projection, and we show that the obtuse vs. acute angles behave symmetrically in the same way. In a further corollary we make a link to sign random projection, where we generalise earlier results. Numerical simulations confirm our theoretical results. Finally we give an application of our results to bounding the generalisation error of compressive linear classifiers under the margin loss.

Ata Kabán | A. Kabán

[1] Alexander I. Barvinok. Integration and Optimization of Multivariate Polynomials by Restriction onto a Random Subspace , 2007, Found. Comput. Math..

[2] Kenneth Ward Church,et al. Improving Random Projections Using Marginal Information , 2006, COLT.

[3] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[4] Noga Alon,et al. Problems and results in extremal combinatorics--I , 2003, Discret. Math..

[5] Adam Tauman Kalai,et al. Disentangling Gaussians , 2012, Commun. ACM.

[6] N. Alon. Problems and results in Extremal Combinatorics , Part , 2002 .

[7] Sanjay Chawla,et al. An incremental data-stream sketch using sparse random projections , 2007, SDM.

[8] David P. Williamson,et al. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[9] Heikki Mannila,et al. Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[10] S. Frick,et al. Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[11] Samuel Kaski,et al. Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[12] Bernard Chazelle,et al. The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[13] Sanjoy Dasgupta,et al. An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[14] Ata Kabán. New Bounds on Compressive Linear Least Squares Regression , 2014, AISTATS.

[15] Dmitriy Fradkin,et al. Experiments with random projections for machine learning , 2003, KDD '03.

[16] Ata Kabán,et al. Sharp Generalization Error Bounds for Randomly-projected Classifiers , 2013, ICML.

[17] Christos Boutsidis,et al. Random Projections for Linear Support Vector Machines , 2012, TKDD.

[18] Alessandro Panconesi,et al. Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[19] Anton van den Hengel,et al. Is margin preserved after random projection? , 2012, ICML.

[20] Avrim Blum,et al. Random Projection, Margins, Kernels, and Feature-Selection , 2005, SLSFS.

[21] Jon M. Kleinberg,et al. Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[22] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[23] Sanparith Marukatat. Classification with Sign Random Projections , 2014, PRICAI.