The Helmholtz Method: Using Perceptual Compression to Reduce Machine Learning Complexity

This paper proposes a fundamental answer to a frequently asked question in multimedia computing and machine learning: Do artifacts from perceptual compression contribute to error in the machine learning process and if so, how much? Our approach to the problem is a reinterpretation of the Helmholtz Free Energy formula from physics to explain the relationship between content and noise when using sensors (such as cameras or microphones) to capture multimedia data. The reinterpretation allows a bit-measurement of the noise contained in images, audio, and video by combining a classifier with perceptual compression, such as JPEG or MP3. Our experiments on CIFAR-10 as well as Fraunhofer's IDMT-SMT-Audio-Effects dataset indicate that, at the right quality level, perceptual compression is actually not harmful but contributes to a significant reduction of complexity of the machine learning process. That is, our noise quantification method can be used to speed up the training of deep learning classifiers significantly while maintaining, or sometimes even improving, overall classification accuracy. Moreover, our results provide insights into the reasons for the success of deep learning.

[1]  Mohammad Soleymani,et al.  The Community and the Crowd: Multimedia Benchmark Dataset Development , 2012, IEEE MultiMedia.

[2]  Li Chen,et al.  Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression , 2017, ArXiv.

[3]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[4]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[5]  Didier Le Gall,et al.  MPEG: a video compression standard for multimedia applications , 1991, CACM.

[6]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Roger A. Pearce,et al.  Large-Scale Deep Learning on the YFCC100M Dataset , 2015, ArXiv.

[9]  J. Gallant,et al.  Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies , 2011, Current Biology.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Ron Meir,et al.  A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound , 2004, NIPS.

[12]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ramesh C. Jain,et al.  Multimedia Computing , 2014, IEEE Multim..

[14]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[20]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[21]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[22]  Georges Quénot,et al.  TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.

[23]  Jakob Abeßer,et al.  Automatic Detection of Audio Effects in Guitar and Bass Recordings , 2010 .

[24]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[25]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.