Information Theory, Inference, and Learning Algorithms

Fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.

[1]  Illtyd Trethowan Causality , 1938 .

[2]  A. Bhattacharyya On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[3]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[5]  L. Riggs,et al.  Involuntary motions of the eye during monocular fixation. , 1950, Journal of experimental psychology.

[6]  W. McCulloch,et al.  The limiting information capacity of a neuronal link , 1952 .

[7]  George Polya,et al.  Induction and Analogy in Mathematics , 1954 .

[8]  Brockway McMillan,et al.  Two inequalities implied by unique decipherability , 1956, IRE Trans. Inf. Theory.

[9]  J. D. Bernal,et al.  “The Origins of Life” , 1957, Nature.

[10]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[11]  G. Matheron Principles of geostatistics , 1963 .

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[13]  C. McCollough Color Adaptation of Edge-Detectors in the Human Visual System , 1965, Science.

[14]  F. Reif,et al.  Fundamentals of Statistical and Thermal Physics , 1965 .

[15]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[16]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[17]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[18]  J. M. Smith,et al.  “Haldane's Dilemma” and the Rate of Evolution , 1968, Nature.

[19]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[20]  E. Seneta,et al.  Studies in the History of Probability and Statistics. XXXI. The simple branching process, a turning point test and a fundamental inequality: A historical note on I. J. Bienaymé , 1972 .

[21]  D. Mackay,et al.  The time course of the McCollough effect and its physiological implications. , 1974, The Journal of physiology.

[22]  J. Hopfield Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[23]  John Cocke,et al.  Optimal decoding of linear codes for minimizing symbol error rate (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[24]  P. Feldman Evolution of sex , 1975, Nature.

[25]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[26]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[27]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[28]  Robert J. McEliece,et al.  The Theory of Information and Coding , 1979 .

[29]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[30]  Robert G. Gallager,et al.  Variations on a theme by Huffman , 1978, IEEE Trans. Inf. Theory.

[31]  J. Hopfield Origin of the genetic code: a testable hypothesis based on tRNA structure, sequence, and kinetic proofreading. , 1978, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[33]  E.R. Berlekamp,et al.  The technology of error-correcting codes , 1980, Proceedings of the IEEE.

[34]  J. Hopfield The energy relay: a proofreading scheme based on dynamic cooperativity and lacking all characteristic symptoms of kinetic proofreading in DNA replication and protein synthesis. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Robert Michael Tanner,et al.  A recursive approach to low complexity codes , 1981, IEEE Trans. Inf. Theory.

[36]  S. Adler Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions , 1981 .

[37]  C. S. Wallace,et al.  Archaeoastronomy in the Old World: STONE CIRCLE GEOMETRIES: AN INFORMATION THEORY APPROACH , 1982 .

[38]  Stephen Barnett,et al.  Matrix Methods for Engineers and Scientists , 1982 .

[39]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[41]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[42]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[43]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[44]  Sompolinsky,et al.  Storing infinite numbers of patterns in a spin-glass model of neural networks. , 1985, Physical review letters.

[45]  N. J. Cohen,et al.  Higher-Order Boltzmann Machines , 1986 .

[46]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[47]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[48]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[49]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[50]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[51]  H. Omre Bayesian kriging—Merging observations and qualified guesses in kriging , 1987 .

[52]  Anthony O'Hagan,et al.  Monte Carlo is fundamentally unsound , 1987 .

[53]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[54]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[55]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[56]  Y. Bar-Shalom Tracking and data association , 1988 .

[57]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[58]  B. Ripley Statistical inference for spatial processes , 1990 .

[59]  S. Dolinar A New Code for Galileo , 1988 .

[60]  J. Skilling Classic Maximum Entropy , 1989 .

[61]  S. P. Luttrell,et al.  Hierarchical vector quantisation , 1989 .

[62]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[63]  T. Loredo From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics , 1990 .

[64]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[65]  Stephen P. Luttrell,et al.  Derivation of a class of training algorithms , 1990, IEEE Trans. Neural Networks.

[66]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[67]  J. Angel,et al.  Adaptive optics for array telescopes using neural-network techniques , 1990, Nature.

[68]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[69]  Stuart J. Russell,et al.  Do the right thing - studies in limited rationality , 1991 .

[70]  Raymond W. Yeung,et al.  A new outlook of Shannon's information measures , 1991, IEEE Trans. Inf. Theory.

[71]  Chris Bishop,et al.  Exact Calculation of the Hessian Matrix for the Multilayer Perceptron , 1992, Neural Computation.

[72]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[73]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[74]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[75]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[76]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[77]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[78]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[79]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[80]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[81]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[82]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[83]  Eörs Szathmáry,et al.  The Major Transitions in Evolution , 1997 .

[84]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[85]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[86]  E. Baum,et al.  Best Play for Imperfect Players and Game Tree Search; part I - theory , 1995 .

[87]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[88]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[89]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[90]  Hans-Andrea Loeliger,et al.  Codes and iterative decoding on general graphs , 1995, Eur. Trans. Telecommun..

[91]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[92]  Daniel A. Spielman,et al.  Linear-time encodable and decodable error-correcting codes , 1995, STOC '95.

[93]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[94]  W. Teahan Probability estimation for PPM , 1995 .

[95]  David J. C. MacKay,et al.  Good Codes Based on Very Sparse Matrices , 1995, IMACC.

[96]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[97]  Barak A. Pearlmutter,et al.  Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA , 1996, NIPS.

[98]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[99]  Radford M. Neal,et al.  Near Shannon limit performance of low density parity check codes , 1996 .

[100]  Alain Glavieux,et al.  Reflections on the Prize Paper : "Near optimum error-correcting coding and decoding: turbo codes" , 1998 .

[101]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[102]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[103]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[104]  Niclas Wiberg,et al.  Codes and Decoding on General Graphs , 1996 .

[105]  G. Wahba,et al.  Hybrid Adaptive Splines , 1997 .

[106]  Radford M. Neal Markov Chain Monte Carlo Methods Based on `Slicing' the Density Function , 1997 .

[107]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[108]  Khaled A. S. Abdel-Ghaffar,et al.  Insertion/deletion correction with spectral nulls , 1997, IEEE Trans. Inf. Theory.

[109]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[110]  Daniel A. Spielman,et al.  Practical loss-resilient codes , 1997, STOC '97.

[111]  Eric B. Baum,et al.  A Bayesian Approach to Relevance in Game Playing , 1997, Artif. Intell..

[112]  N. G. Best,et al.  Dynamic conditional independence models and Markov chain Monte Carlo methods , 1997 .

[113]  J. Wolf,et al.  On Two-Dimensional Arrays and Crossword Puzzles , 1998 .

[114]  J G Daugman,et al.  Information Theory and Coding , 1998 .

[115]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[116]  Mark Huber,et al.  Exact sampling and approximate counting techniques , 1998, STOC '98.

[117]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[118]  Radford M. Neal,et al.  Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation , 1995, Learning in Graphical Models.

[119]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[120]  Christopher Holmes,et al.  Perfect Simulation for orthogonal model mixing , 1998 .

[121]  Yoshua Bengio,et al.  The Z-coder adaptive binary coder , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[122]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[123]  M. Luby,et al.  Improved low-density parity-check codes using irregular graphs and belief propagation , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[124]  M. A. Tanner,et al.  Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 3rd Edition , 1998 .

[125]  David J. C. MacKay,et al.  Good Error-Correcting Codes Based on Very Sparse Matrices , 1997, IEEE Trans. Inf. Theory.

[126]  Harri Lappalainen,et al.  Ensemble learning for independent component analysis , 1999 .

[127]  Ali Mansour,et al.  Blind Separation of Sources , 1999 .

[128]  David J. C. MacKay,et al.  Comparison of constructions of irregular Gallager codes , 1999, IEEE Trans. Commun..

[129]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[130]  S. Brink Convergence of iterative decoding , 1999 .

[131]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[132]  Peter D. Keightley,et al.  High genomic deleterious mutation rates in hominids , 1999, Nature.

[133]  A. Terras Fourier Analysis on Finite Groups and Applications: Index , 1999 .

[134]  J J Hopfield,et al.  What is a moment? "Cortical" sensory integration over a brief interval. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[135]  Mark Ridley,et al.  Mendel's Demon: Gene Justice and the Complexity of Life , 2000 .

[136]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[137]  G. Forney,et al.  Codes on graphs: normal realizations , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[138]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[139]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[140]  David J. C. MacKay,et al.  Ensemble Learning for Blind Image Separation and Deconvolution , 2000 .

[141]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[142]  A.,et al.  The Origins of Spread-Spectrum Communications , 2000 .

[143]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[144]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[145]  M.C. Davey,et al.  Watermark codes: reliable communication over insertion/deletion channels , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[146]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[147]  Alan F. Blackwell,et al.  Dasher—a data entry interface using continuous gestures and language models , 2000, UIST '00.

[148]  David J. C. MacKay An Alternative to Runlength-limiting Codes: Turn Timing Errors into Substitution Errors , 2000 .

[149]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[150]  Rüdiger L. Urbanke,et al.  Design of capacity-approaching irregular low-density parity-check codes , 2001, IEEE Trans. Inf. Theory.

[151]  D. Mackay,et al.  Evaluation of Gallager Codes for Short Block Length and High Rate Applications , 2001 .

[152]  Radford M. Neal,et al.  Improving Markov chain Monte Carlo Estimators by Coupling to an Approximating Chain , 2001 .

[153]  Rüdiger L. Urbanke,et al.  Efficient encoding of low-density parity-check codes , 2001, IEEE Trans. Inf. Theory.

[154]  M. Opper,et al.  An Idiosyncratic Journey Beyond Mean Field Theory , 2001 .

[155]  Robert J. McEliece,et al.  BSC Thresholds for Code Ensembles Based on “Typical Pairs” Decoding , 2001 .

[156]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[157]  W. Freeman,et al.  Bethe free energy, Kikuchi approximations, and belief propagation algorithms , 2001 .

[158]  A. Yuille A Double-Loop Algorithm to Minimize the Bethe and Kikuchi Free Energies , 2001 .

[159]  J J Hopfield,et al.  What is a moment? Transient synchrony as a collective mechanism for spatiotemporal integration. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[160]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[161]  David J. C. MacKay,et al.  Reliable communication over channels with insertions, deletions, and substitutions , 2001, IEEE Trans. Inf. Theory.

[162]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[163]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[164]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[165]  Yee Whye Teh,et al.  Discovering Multiple Constraints that are Frequently Approximately Satisfied , 2001, UAI.

[166]  Yee Whye Teh,et al.  Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation , 2001, UAI.

[167]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[168]  Emina Soljanin,et al.  LDPC codes: a group algebra formulation , 2001, Electron. Notes Discret. Math..

[169]  Yee Whye Teh,et al.  A New View of ICA , 2001 .

[170]  Rüdiger L. Urbanke,et al.  The capacity of low-density parity-check codes under message-passing decoding , 2001, IEEE Trans. Inf. Theory.

[171]  Emina Soljanin,et al.  AN ALGEBRAIC DESCRIPTION OF ITERATIVE DECODING SCHEMES , 2001 .

[172]  D. Denison,et al.  Perfect sampling for the wavelet reconstruction of signals , 2002, IEEE Trans. Signal Process..

[173]  Simon Litsyn,et al.  On ensembles of low-density parity-check codes: Asymptotic distance distributions , 2002, IEEE Trans. Inf. Theory.

[174]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[175]  Ole Winther,et al.  Mean-Field Approaches to Independent Component Analysis , 2002, Neural Computation.

[176]  Carl E. Rasmussen,et al.  Bayesian Monte Carlo , 2002, NIPS.

[177]  David J. Ward,et al.  Fast Hands-free Writing by Gaze Direction , 2002, ArXiv.

[178]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[179]  David J. C. MacKay,et al.  Sparse low-density parity-check codes for channels with cross-talk , 2003, Proceedings 2003 IEEE Information Theory Workshop (Cat. No.03EX674).

[180]  Lurias,et al.  MUTATIONS OF BACTERIA FROM VIRUS SENSITIVITY TO VIRUS RESISTANCE’-’ , 2003 .

[181]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[182]  David J. C. MacKay,et al.  Sparse-graph codes for quantum error correction , 2004, IEEE Transactions on Information Theory.

[183]  David J. C. MacKay,et al.  Choice of Basis for Laplace Approximation , 1998, Machine Learning.

[184]  J. J. Hopfield,et al.  “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.

[185]  Alex M. Andrew,et al.  Information Theory, Inference, and Learning Algorithms , 2004 .

[186]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[187]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[188]  Riccardo Zecchina,et al.  Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.

[189]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .