Fast Construction of Correcting Ensembles for Legacy Artificial Intelligence Systems: Algorithms and a Case Study

Abstract This paper presents a new approach for constructing simple and computationally efficient improvements of generic Artificial Intelligence (AI) systems, including Multilayer and Deep Learning neural networks. The improvements are small network ensembles added to the existing AI architectures. Theoretical foundations of the approach are based on stochastic separation theorems and the ideas of the concentration of measure. We show that, subject to mild technical assumptions on statistical properties of internal signals in the original AI system, the approach enables fast removal of the AI’s errors with probability close to one on the datasets which may be exponentially large in dimension. The approach is illustrated with numerical examples and a case study of digits recognition in American Sign Language.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Dianhui Wang,et al.  Stochastic configuration networks ensemble with heterogeneous features for large-scale data analytics , 2017, Inf. Sci..

[3]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[6]  João Gama,et al.  Cascade Generalization , 2000, Machine Learning.

[7]  Dapeng Oliver Wu,et al.  Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Ming Li,et al.  Deep Stochastic Configuration Networks with Universal Approximation Property , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[9]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[10]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Dianhui Wang,et al.  Randomness in neural networks: an overview , 2017, WIREs Data Mining Knowl. Discov..

[12]  Dianhui Wang,et al.  Stochastic Configuration Networks: Fundamentals and Algorithms , 2017, IEEE Transactions on Cybernetics.

[13]  Paul C. Kainen,et al.  Utilizing Geometric Anomalies of High Dimension: When Complexity Makes Computation Easier , 1997 .

[14]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[15]  Alexander N. Gorban,et al.  Approximation of continuous functions of several variables by an arbitrary nonlinear continuous function of one variable, linear functions, and their superpositions , 1998 .

[16]  Ivan Tyukin,et al.  Augmented Artificial Intelligence , 2018 .

[17]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  M. Gromov,et al.  Isoperimetry of Waists and Concentration of Maps , 2003 .

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[24]  Ming Li,et al.  Insights into randomized algorithms for neural networks: Practical issues and common pitfalls , 2017, Inf. Sci..

[25]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[26]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[27]  Ivan Tyukin,et al.  Stochastic Separation Theorems , 2017, Neural Networks.

[28]  Donald A. Jackson STOPPING RULES IN PRINCIPAL COMPONENTS ANALYSIS: A COMPARISON OF HEURISTICAL AND STATISTICAL APPROACHES' , 1993 .

[29]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[30]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[31]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[32]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[33]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[34]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[35]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[36]  Ivan Tyukin,et al.  Correction of AI systems by linear discriminants: Probabilistic foundations , 2018, Inf. Sci..

[37]  A. N. Gorban Order--disorder separation: Geometric revision , 2005 .

[38]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[39]  P. Levy,et al.  Problèmes concrets d'analyse fonctionnelle , 1952 .

[40]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[41]  Rauf Izmailov,et al.  Knowledge transfer in SVM and neural networks , 2017, Annals of Mathematics and Artificial Intelligence.

[42]  Ivan Tyukin,et al.  The Blessing of Dimensionality: Separation Theorems in the Thermodynamic Limit , 2016, ArXiv.

[43]  Ivan Tyukin,et al.  Blessing of dimensionality: mathematical foundations of the statistical physics of data , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[44]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[45]  Ivan Tyukin,et al.  The unreasonable effectiveness of small neural ensembles in high-dimensional brain , 2018, Physics of life reviews.

[46]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[47]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[48]  Conrad D. James,et al.  Neurogenesis deep learning: Extending deep networks to accommodate new classes , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[49]  Dianna Radpour,et al.  Using Deep Convolutional Networks for Gesture Recognition in American Sign Language , 2017, ArXiv.

[50]  Bodo Rosenhahn,et al.  Expanding object detector's Horizon: Incremental learning framework for object detection in videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ivan Tyukin,et al.  One-trial correction of legacy AI systems and stochastic separation theorems , 2019, Inf. Sci..

[52]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[53]  M. Gromov Metric Structures for Riemannian and Non-Riemannian Spaces , 1999 .

[54]  J. Gibbs Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundation of Thermodynamics , 1902 .

[55]  Alexander N Gorban,et al.  Quasi-equilibrium closure hierarchies for the Boltzmann equation , 2003, cond-mat/0305599.

[56]  Mikhail Belkin,et al.  The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.