Quantifying the effect of representations on task complexity

We examine the influence of input data representations on learning complexity. For learning, we posit that each model implicitly uses a candidate model distribution for unexplained variations in the data, its noise model. If the model distribution is not well aligned to the true distribution, then even relevant variations will be treated as noise. Crucially however, the alignment of model and true distribution can be changed, albeit implicitly, by changing data representations. "Better" representations can better align the model to the true distribution, making it easier to approximate the input-output relationship in the data without discarding useful data variations. To quantify this alignment effect of data representations on the difficulty of a learning task, we make use of an existing task complexity score and show its connection to the representation-dependent information coding length of the input. Empirically we extract the necessary statistics from a linear regression approximation and show that these are sufficient to predict relative learning performance outcomes of different data representations and neural network types obtained when utilizing an extensive neural network architecture search. We conclude that to ensure better learning outcomes, representations may need to be tailored to both task and model to align with the implicit distribution of model and task.

[1]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[2]  E. Candès,et al.  Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges , 2000 .

[3]  Lars Schmidt-Thieme,et al.  Beyond Manual Tuning of Hyperparameters , 2015, KI - Künstliche Intelligenz.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[6]  Titus Cieslewski,et al.  Are We Ready for Autonomous Drone Racing? The UZH-FPV Drone Racing Dataset , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Alistair G. Rust,et al.  Image redundancy reduction for neural network classification using discrete cosine transforms , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[8]  Adam Gaier,et al.  Weight Agnostic Neural Networks , 2019, NeurIPS.

[9]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[10]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[12]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  O. Christensen An introduction to frames and Riesz bases , 2002 .

[18]  Chris Thornton,et al.  Measuring the difficulty of specific learning problems , 1995 .

[19]  Joost van de Weijer,et al.  Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[20]  Jessica Burgner-Kahrs,et al.  On the Merits of Joint Space and Orientation Representations in Learning the Forward Kinematics in SE(3) , 2019, Robotics: Science and Systems.

[21]  Nando de Freitas,et al.  Hedging Strategies for Bayesian Optimization , 2010 .

[22]  Wolfram Burgard,et al.  The Freiburg Groceries Dataset , 2016, ArXiv.

[23]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[24]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Yaser S. Abu-Mostafa,et al.  Data complexity in machine learning and novel classification algorithms , 2006 .

[27]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[28]  Martin D. Levine,et al.  Face Recognition Using the Discrete Cosine Transform , 2001, International Journal of Computer Vision.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Christine Podilchuk,et al.  Face recognition using DCT-based feature vectors , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[31]  Jianqing Fan,et al.  Variance estimation using refitted cross‐validation in ultrahigh dimensional regression , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[32]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[33]  Stefano Soatto,et al.  Where is the Information in a Deep Neural Network? , 2019, ArXiv.

[34]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[35]  Jason Yosinski,et al.  Faster Neural Networks Straight from JPEG , 2018, NeurIPS.

[36]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[38]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[39]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[40]  S. Mallat A wavelet tour of signal processing , 1998 .

[41]  Meng Joo Er,et al.  High-speed face recognition based on discrete cosine transform and RBF neural networks , 2005, IEEE Transactions on Neural Networks.

[42]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.