Power-law scaling to assist with key challenges in artificial intelligence

Power-law scaling, a central concept in critical phenomena, is found to be useful in deep learning, where optimized test errors on handwritten digit examples converge as a power-law to zero with database size. For rapid decision making with one training epoch, each example is presented only once to the trained network, the power-law exponent increased with the number of hidden layers. For the largest dataset, the obtained test error was estimated to be in the proximity of state-of-the-art algorithms for large epoch numbers. Power-law scaling assists with key challenges found in current artificial intelligence applications and facilitates an a priori dataset size estimation to achieve a desired test accuracy. It establishes a benchmark for measuring training complexity and a quantitative hierarchy of machine learning tasks and algorithms.

[1]  Cordelia Schmid,et al.  IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2004, Washington, DC, USA, June 27 - July 2, 2004 , 2004, CVPR Workshops.

[2]  Sridhar Narayan,et al.  The Generalized Sigmoid Activation Function: Competetive Supervised Learning , 1997, Inf. Sci..

[3]  Ying Zhang,et al.  A strategy to apply machine learning to small datasets in materials science , 2018, npj Computational Materials.

[4]  Harry Eugene Stanley,et al.  Catastrophic cascade of failures in interdependent networks , 2009, Nature.

[5]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[6]  Shruti Mishra,et al.  Machine learning in a data-limited regime: Augmenting experiments with synthetic data uncovers order in crumpled sheets , 2018, Science Advances.

[7]  She,et al.  Universal scaling laws in fully developed turbulence. , 1994, Physical review letters.

[8]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[9]  Christopher C. Cline,et al.  Noninvasive neuroimaging enhances continuous neural tracking for robotic device control , 2019, Science Robotics.

[10]  Shang‐keng Ma Modern Theory of Critical Phenomena , 1976 .

[11]  Kanter,et al.  Markov processes: Linguistics and Zipf's law. , 1995, Physical review letters.

[12]  Lei Wang,et al.  Discovering phase transitions with unsupervised learning , 2016, 1606.00318.

[13]  Yang Yang,et al.  Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.

[14]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[15]  Ido Kanter,et al.  Brain experiments imply adaptation mechanisms which outperform common AI learning algorithms , 2020, Scientific Reports.

[16]  George Barbastathis,et al.  Low Photon Count Phase Retrieval Using Deep Learning. , 2018, Physical review letters.

[17]  K. Wilson The renormalization group: Critical phenomena and the Kondo problem , 1975 .

[18]  Po-Yao Huang,et al.  Structural Analysis and Optimization of Convolutional Neural Networks with a Small Sample Size , 2020, Scientific Reports.

[19]  Sunghak Lee,et al.  Understanding the physical metallurgy of the CoCrFeMnNi high-entropy alloy: an atomistic simulation study , 2018, npj Computational Materials.

[20]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Jonathan S. Rosenfeld,et al.  A Constructive Prediction of the Generalization Error Across Scales , 2020, ICLR.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  M. Chorilli,et al.  Terpinen-4-ol and nystatin co-loaded precursor of liquid crystalline system for topical treatment of oral candidiasis , 2020, Scientific Reports.

[25]  J. Nathan Kutz,et al.  Putting a bug in ML: The moth olfactory network learns to read MNIST , 2018, Neural Networks.

[26]  X. Gabaix Power Laws in Economics and Finance , 2008 .

[27]  Hongyu Shen,et al.  Enabling real-time multi-messenger astrophysics discoveries with deep learning , 2019, Nature Reviews Physics.

[28]  Kim Christensen,et al.  Unified scaling law for earthquakes. , 2001, Physical review letters.

[29]  M. Mildner,et al.  Re-epithelialization and immune cell behaviour in an ex vivo human skin model , 2020, Scientific Reports.

[30]  Jeffrey G. Ojemann,et al.  Power-Law Scaling in the Brain Surface Electric Potential , 2009, PLoS Comput. Biol..

[31]  S. Havlin,et al.  Self-similarity of complex networks , 2005, Nature.

[32]  D. Whiteson,et al.  Deep Learning and Its Application to LHC Physics , 2018, Annual Review of Nuclear and Particle Science.

[33]  Roland Bouffanais,et al.  Optimal network topology for responsive collective behavior , 2018, Science Advances.

[34]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.