High-Dimensional \(p\)-Norms

Let \(\mathbf {X}=(X_1, \ldots , X_d)\) be a \(\mathbb R^d\)-valued random vector with i.i.d. components, and let \(\Vert \mathbf {X}\Vert _p= (\sum _{j=1}^d|X_j|^p)^{1/p}\) be its \(p\)-norm, for \(p>0\). The impact of letting \(d\) go to infinity on \(\Vert \mathbf {X}\Vert _p\) has surprising consequences, which may dramatically affect high-dimensional data processing. This effect is usually referred to as the distance concentration phenomenon in the computational learning literature. Despite a growing interest in this important question, previous work has essentially characterized the problem in terms of numerical experiments and incomplete mathematical statements. In this paper, we solidify some of the arguments which previously appeared in the literature and offer new insights into the phenomenon.

[1]  C. Mallows A Note on Asymptotic Joint Normality , 1972 .

[2]  V. V. Petrov Sums of Independent Random Variables , 1975 .

[3]  V. V. Yurinskii On the Error of the Gaussian Approximation for Convolutions , 1978 .

[4]  H. A. David,et al.  Order Statistics (2nd ed). , 1981 .

[5]  S. Resnick Extreme Values, Regular Variation, and Point Processes , 1987 .

[6]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[7]  R. Reiss Approximate Distributions of Order Statistics , 1989 .

[8]  J. Siemons Surveys in combinatorics, 1989 , 1989 .

[9]  R. Reiss Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics , 1989 .

[10]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[11]  D. Keim,et al.  What Is the Nearest Neighbor in High Dimensional Spaces? , 2000, VLDB.

[12]  E. Giné,et al.  The L 1 – Norm Density Estimator Process , 2001 .

[13]  David Pollard,et al.  A User's Guide to Measure Theoretic Probability by David Pollard , 2001 .

[14]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[15]  D. Pollard A User's Guide to Measure Theoretic Probability by David Pollard , 2001 .

[16]  E. Giné,et al.  The $\bm{L}_\mathbf{1}$-norm density estimator process , 2003 .

[17]  Rasul A. Khan,et al.  Approximation for the expectation of a function of the sample mean , 2004 .

[18]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[19]  Ata Kabán,et al.  Non-parametric detection of meaningless distances in high dimensional data , 2011, Statistics and Computing.

[20]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.