General stochastic separation theorems with optimal bounds

Phenomenon of stochastic separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities. In high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust Fisher's discriminant (is Fisher separable). Errors or clusters of errors can be separated from the rest of the data. The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same stochastic separability that holds the keys to understanding the fundamentals of robustness and adaptivity in high-dimensional data-driven AI. To manage errors and analyze vulnerabilities, the stochastic separation theorems should evaluate the probability that the dataset will be Fisher separable in given dimensionality and for a given class of distributions. Explicit and optimal estimates of these separation probabilities are required, and this problem is solved in present work. The general stochastic separation theorems with optimal probability estimates are obtained for important classes of distributions: log-concave distribution, their convex combinations and product distributions. The standard i.i.d. assumption was significantly relaxed. These theorems and estimates can be used both for correction of high-dimensional data driven AI systems and for analysis of their vulnerabilities. The third area of application is the emergence of memories in ensembles of neurons, the phenomena of grandmother's cells and sparse coding in the brain, and explanation of unexpected effectiveness of small neural ensembles in high-dimensional brain.

[1]  Paul C. Kainen,et al.  Quasiorthogonal dimension of euclidean spaces , 1993 .

[2]  Ivan Tyukin,et al.  Approximation with random bases: Pro et Contra , 2015, Inf. Sci..

[3]  Vladimir Pestov The concentration property , 2006 .

[4]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[5]  J. Gibbs Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundation of Thermodynamics , 1902 .

[6]  Paul C. Kainen,et al.  Utilizing Geometric Anomalies of High Dimension: When Complexity Makes Computation Easier , 1997 .

[7]  Rodrigo Quian Quiroga Akakhievitch revisited: Comment on "The unreasonable effectiveness of small neural ensembles in high-dimensional brain" by Alexander N. Gorban et al. , 2019, Physics of life reviews.

[8]  Ivan Tyukin,et al.  The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit , 2016, ArXiv.

[9]  Andrei Zinovyev,et al.  Lizard Brain: Tackling Locally Low-Dimensional Yet Globally Complex Organization of Multi-Dimensional Datasets , 2020, Frontiers in Neurorobotics.

[10]  M. Gromov Isoperimetry of waists and concentration of maps , 2003 .

[11]  Sergey Sidorov,et al.  Linear and Fisher Separability of Random Points in the d-Dimensional Spherical Layer and Inside the d-Dimensional Cube , 2020, Entropy.

[12]  Roderick Wong,et al.  Asymptotic approximations of integrals , 1989, Classics in applied mathematics.

[13]  B. Delyon,et al.  Concentration inequalities for sums , 2015 .

[14]  Desmond J. Higham,et al.  On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[15]  J. Gibbs Elementary Principles in Statistical Mechanics , 1902 .

[16]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[17]  Ivan Tyukin,et al.  Stochastic Separation Theorems , 2017, Neural Networks.

[18]  Věra Kůrková,et al.  Some insights from high-dimensional spheres: Comment on "The unreasonable effectiveness of small neural ensembles in high-dimensional brain" by Alexander N. Gorban et al. , 2019, Physics of life reviews.

[19]  V. Milman,et al.  Concentration Property on Probability Spaces , 2000 .

[20]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[21]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[22]  I. Jolliffe Principal Component Analysis , 2002 .

[23]  Ivan Tyukin,et al.  Blessing of dimensionality: mathematical foundations of the statistical physics of data , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[24]  P. C. Kainen,et al.  Quasiorthogonal Dimension , 2020, Studies in Computational Intelligence.

[25]  S. Li Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[26]  Bogdan Grechuk,et al.  Practical stochastic separation theorems for product distributions , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[27]  Ivan Tyukin,et al.  Correction of AI systems by linear discriminants: Probabilistic foundations , 2018, Inf. Sci..

[28]  C. Koch,et al.  Brain cells for grandmother. , 2013, Scientific American.

[29]  Alexander N. Gorban,et al.  Fluorescence-based assay as a new screening tool for toxic chemicals , 2016, Scientific Reports.

[30]  Marcello Sanguineti,et al.  Probabilistic Bounds for Binary Classification of Large Data Sets , 2019, INNSBDDL.

[31]  P. Levy,et al.  Problèmes concrets d'analyse fonctionnelle , 1952 .

[32]  J. G. Wendel Note on the Gamma Function , 1948 .

[33]  Asymptotic expansion Of the incomplete beta function for large values Of the first parameter , 1999 .

[34]  David L. Donoho,et al.  Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[35]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[36]  Zoltán Füredi,et al.  On the shape of the convex hull of random points , 1988 .

[37]  Julia Makarova,et al.  High-Dimensional Brain: A Tool for Encoding and Rapid Learning of Memories by Single Neurons , 2017, Bulletin of Mathematical Biology.

[38]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[39]  Gaussian concentration for a class of spherically invariant measures , 2010 .

[40]  Ivan Tyukin,et al.  The unreasonable effectiveness of small neural ensembles in high-dimensional brain , 2018, Physics of life reviews.

[41]  Huyên Pham Some Applications and Methods of Large Deviations in Finance and Insurance , 2007 .

[42]  Vladik Kreinovich The heresy of unheard-of simplicity: Comment on "The unreasonable effectiveness of small neural ensembles in high-dimensional brain" by A.N. Gorban, V.A. Makarov, and I.Y. Tyukin. , 2019, Physics of life reviews.

[43]  Ivan Tyukin,et al.  Universal principles justify the existence of concept cells , 2019, Scientific Reports.

[44]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[45]  Gábor Lugosi,et al.  Concentration Inequalities , 2008, COLT.

[46]  Alexander N. Gorban,et al.  Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality , 2020, Entropy.

[47]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[48]  Jin Wang,et al.  Overview on DeepMind and Its AlphaGo Zero AI , 2018, ICBDE.

[49]  M. Ledoux The concentration of measure phenomenon , 2001 .

[50]  Dianhui Wang,et al.  Stochastic Configuration Networks: Fundamentals and Algorithms , 2017, IEEE Transactions on Cybernetics.

[51]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[52]  Rodrigo Quian Quiroga,et al.  Human medial temporal lobe neurons respond preferentially to personally relevant images , 2009, Proceedings of the National Academy of Sciences.

[53]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[54]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[55]  Alexander N. Gorban,et al.  Principal Manifolds and Graphs in Practice: from Molecular Biology to Dynamical Systems , 2010, Int. J. Neural Syst..

[56]  Konstantin I. Sofeikov,et al.  Knowledge Transfer Between Artificial Intelligence Systems , 2017, Front. Neurorobot..

[57]  Francesco Camastra,et al.  Data dimensionality estimation methods: a survey , 2003, Pattern Recognit..

[58]  V. V. Petrov On the Probabilities of Large Deviations for Sums of Independent Random Variables , 1965 .

[59]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.