A Machine Learning Perspective on Predictive Coding with PAQ8

PAQ8 makes use of several simple machine learning models and algorithms. We show how understanding PAQ8 enables us to improve the algorithms. We also present a broad range of new applications of PAQ8 to machine learning tasks including language modeling and adaptive text prediction, adaptive game playing, classification, and lossy compression using features acquired via unsupervised learning.

[1]  D. Hilbert Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .

[2]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[3]  Glen G. Langdon,et al.  Arithmetic Coding , 1979 .

[4]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[5]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[6]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[7]  John G. Cleary,et al.  Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.

[8]  Gabriela Andreu,et al.  Selecting the toroidal self-organizing feature maps (TSOFM) best organized to object recognition , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[9]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[10]  Constantine Stephanidis,et al.  Universal access in the information society , 1999, HCI.

[11]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[12]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering via Data Augmentation , 2001, NIPS.

[13]  Jason D. M. Rennie Improving multi-class text classification with Naive Bayes , 2001 .

[14]  Dmitry A. Shkarin,et al.  PPM: one step to practicality , 2002, Proceedings DCC 2002. Data Compression Conference.

[15]  Francisco Casacuberta,et al.  Cyclic Sequence Alignments: Approximate Versus Optimal Techniques , 2002, Int. J. Pattern Recognit. Artif. Intell..

[16]  Ken Sugawara,et al.  A New Pattern Representation Scheme Using Data Compression , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[18]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[19]  J. Hawkins,et al.  On Intelligence , 2004 .

[20]  Dale Schuurmans,et al.  Augmenting Naive Bayes Classifiers with Statistical Language Models , 2004, Information Retrieval.

[21]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[22]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[23]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[24]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[25]  Yong Zhang,et al.  Prediction by partial approximate matching for lossless image compression , 2008, Data Compression Conference.

[26]  Ning Wu,et al.  On Compression-Based Text Classification , 2005, ECIR.

[27]  Nestor Garay-Vitoria,et al.  Text prediction systems: a survey , 2006, Universal Access in the Information Society.

[28]  Matthew V. Mahoney,et al.  Adaptive weighing of context models for lossless data compression , 2005 .

[29]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[30]  Li Wei,et al.  Efficiently finding unusual shapes in large image databases , 2008, Data Mining and Knowledge Discovery.

[31]  Yong Zhang,et al.  Prediction by Partial Approximate Matching for Lossless Image Compression , 2005, IEEE Transactions on Image Processing.

[32]  Manuele Bicego,et al.  2D Shape Classification Using Multifractional Brownian Motion , 2008, SSPR/SPR.

[33]  Alessandro Perina,et al.  A New Generative Feature Set Based on Entropy Distance for Discriminative Classification , 2009, ICIAP.

[34]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Paolo Ferragina,et al.  Text Compression , 2009, Encyclopedia of Database Systems.

[36]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[37]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[38]  Manuele Bicego,et al.  Non-linear generative embeddings for kernels on latent variable models , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[39]  Robert P. W. Duin,et al.  Clustering-Based Construction of Hidden Markov Models for Generative Kernels , 2009, EMMCVPR.

[40]  Maxim Smirnov,et al.  Data Compression Explained , 2010 .

[41]  Yee Whye Teh,et al.  Lossless Compression Based on the Sequence Memoizer , 2010, 2010 Data Compression Conference.

[42]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[43]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.