Conditioning, Mutual Information, and Information Gain

In this chapter we want to discuss the extension of three concepts of classical information theory, namely conditional information, transinformation (also called mutual information), and information gain (also called Kullback–Leibler distance) from descriptions to (reasonably large classes of) covers. This extension will also extend these concepts from discrete to continuous random variables.

[1]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[2]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[3]  Robert A. Meyers,et al.  Encyclopedia of Complexity and Systems Science , 2009 .

[4]  Joseph J Atick,et al.  Could information theory provide an ecological theory of sensory processing? , 2011, Network.

[5]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[6]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[7]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[8]  Gasper Tkacik,et al.  Cell biology: Networks, regulation, pathways , 2007, 0712.4385.

[9]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[10]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[11]  G. Deco,et al.  An Information-Theoretic Approach to Neural Computing , 1997, Perspectives in Neural Computing.

[12]  Ralph Linsker,et al.  Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network , 1992, Neural Computation.

[13]  H Herzel,et al.  Information content of protein sequences. , 2000, Journal of theoretical biology.

[14]  Donall A. Mac Donaill Molecular informatics: Hydrogen-bonding, error-coding, and genetic replication , 2009, CISS 2009.

[15]  Ebeling,et al.  Entropies of biosequences: The role of repeats. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[16]  Friedrich T. Sommer,et al.  Adaptive compressed sensing — A new class of self-organizing coding models for neuroscience , 2009, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Barak A. Pearlmutter,et al.  G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .

[18]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[19]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[20]  Ryotaro Kamimura Information theoretic neural computation , 2002 .

[21]  S. Buldyrev,et al.  Species independence of mutual information in coding and noncoding DNA. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[22]  William Bialek,et al.  Cell Biology: Networks, Regulation and Pathways , 2009, Encyclopedia of Complexity and Systems Science.

[23]  Andrei N. Kolmogorov,et al.  On the Shannon theory of information transmission in the case of continuous signals , 1956, IRE Trans. Inf. Theory.

[24]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[25]  Gianluigi Mongillo,et al.  Online Learning with Hidden Markov Models , 2008, Neural Computation.

[26]  T. Nayak Statistical Significance: Rationale, Validity and Utility , 1997 .

[27]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[28]  Deniz Erdoğmuş,et al.  Online entropy manipulation: stochastic information gradient , 2003, IEEE Signal Processing Letters.

[29]  S. Amari Differential Geometry of Curved Exponential Families-Curvatures and Information Loss , 1982 .

[30]  William Bialek,et al.  Estimating mutual information and multi-information in large networks , 2005, ArXiv.

[31]  John S Denker,et al.  AIP Conference Proceedings 151 on Neural Networks for Computing , 1987 .

[32]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[33]  Robert Jenssen,et al.  Spectral feature projections that maximize Shannon mutual information with class labels , 2006, Pattern Recognit..

[34]  H. Bauer,et al.  Probability Theory and Elements of Measure Theory , 1982 .

[35]  S. F. Taylor,et al.  Information and fitness , 2007, 0712.4382.

[36]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[37]  Geoffrey E. Hinton,et al.  Learning Population Codes by Minimizing Description Length , 1993, Neural Computation.

[38]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[39]  Jonathan A. Marshall,et al.  An introduction to neural and electronic networks: Edited by Steven F. Zornetzer, Joel L. Davis, and Clifford Lau, Academic Press, San Diego, CA: 1990, hardcover $99.50, paperback $44.95, 493 pp., ISBN 0-12-781881-2 , 1992 .

[40]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[41]  Ralph Linsker,et al.  A Local Learning Rule That Enables Information Maximization for Arbitrary Input Distributions , 1997, Neural Computation.

[42]  H. Herzel,et al.  Estimating the entropy of DNA sequences. , 1997, Journal of theoretical biology.

[43]  Aapo Hyvärinen,et al.  An alternative approach to infomax and independent component analysis , 2002, Neurocomputing.

[44]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[45]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[46]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.