Information theory related learning

This is the introduction paper to a special session held on ESANN conference 2011. It reviews and highlights recent developments and new direction in information related learning, which is af astly de- veloping research area. These algorithms are based on the fundamental principles of information theory and relate them implicitly or explicitly to learning algoithms and strategies.

[1]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[2]  Colin Fyfe,et al.  Bregman Divergences and the Self Organising Map , 2008, IDEAL.

[3]  G. Deco,et al.  An Information-Theoretic Approach to Neural Computing , 1997, Perspectives in Neural Computing.

[4]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[5]  Dinh-Tuan Pham,et al.  Mutual information approach to blind separation of stationary sources , 2002, IEEE Trans. Inf. Theory.

[6]  A. Cichocki,et al.  Nonnegative matrix factorization with -divergence , 2008 .

[7]  Thomas Villmann,et al.  Neighbor embedding XOM for dimension reduction and visualization , 2011, Neurocomputing.

[8]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[10]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[11]  L. Goddard Information Theory , 1962, Nature.

[12]  Thomas Villmann,et al.  Magnification control for batch neural gas , 2007, ESANN.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Axel Wismüller,et al.  The Exploration Machine - A Novel Method for Data Visualization , 2009, WSOM.

[15]  José Carlos Príncipe,et al.  Self organizing maps with the correntropy induced metric , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[16]  Colin Fyfe,et al.  Bregman Divergences and Multi-dimensional Scaling , 2009, ICONIP.

[17]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Thomas Villmann,et al.  Mathematical Foundations of the Self Organized Neighbor Embedding (SONE) for Dimension Reduction and Visualization , 2011, ESANN.

[19]  Angel Cataron,et al.  An informational energy LVQ approach for feature ranking , 2004, ESANN.

[20]  Thomas Villmann,et al.  Exploratory Observation Machine (XOM) with Kullback-Leibler Divergence for Dimensionality Reduction and Visualization , 2010, ESANN.

[21]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[22]  Thomas Villmann,et al.  Divergence-Based Vector Quantization , 2011, Neural Computation.

[23]  Christopher K. I. Williams,et al.  Magnification factors for the SOM and GTM algorithms , 1997 .

[24]  Thomas Villmann,et al.  Multispectral image characterization by partial generalized covariance , 2011, ESANN.

[25]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[26]  Thomas Villmann,et al.  Magnification in divergence based neural maps , 2011, The 2011 International Joint Conference on Neural Networks.

[27]  Marc M. Van Hulle Topographic map formation by maximizing unconditional entropy: a plausible strategy for "online" unsupervised competitive learning and nonparametric density estimation , 1996, IEEE Trans. Neural Networks.

[28]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[29]  Deniz Erdogmus,et al.  Vector quantization using information theoretic concepts , 2005, Natural Computing.

[30]  Paul L. Zador,et al.  Asymptotic quantization error of continuous signals and the quantization dimension , 1982, IEEE Trans. Inf. Theory.

[31]  Duane DeSieno,et al.  Adding a conscience to competitive learning , 1988, IEEE 1988 International Conference on Neural Networks.

[32]  Thomas Villmann,et al.  Explicit Magnification Control of Self-Organizing Maps for “Forbidden” Data , 2007, IEEE Transactions on Neural Networks.

[33]  José Carlos Príncipe,et al.  Variable Selection: A Statistical Dependence Perspective , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[34]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[35]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[36]  Michel Verleysen,et al.  Information-theoretic feature selection for functional data classification , 2009, Neurocomputing.

[37]  Alexander Kraskov,et al.  Least-dependent-component analysis based on mutual information. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Marc M. Van Hulle Density-based clustering with topographic maps , 1999, IEEE Trans. Neural Networks.

[39]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[40]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[41]  Shinto Eguchi,et al.  Information Divergence Geometry and the Application to Statistical Machine Learning , 2009 .

[42]  Thomas Villmann,et al.  Multivariate class labeling in Robust Soft LVQ , 2011, ESANN.

[43]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[44]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[45]  Amaury Lendasse,et al.  On the statistical estimation of Rényi entropies , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[46]  Gustavo Deco,et al.  Unsupervised Mutual Information Criterion for Elimination of Overtraining in Supervised Multilayer Networks , 1995, Neural Computation.

[47]  Thomas Martinetz,et al.  Sparse Coding Neural Gas: Learning of overcomplete data representations , 2009, Neurocomputing.

[48]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[49]  Ginés Rubio,et al.  Mutual Information Based Initialization of Forward-Backward Search for Feature Selection in Regression Problems , 2009, ICANN.

[50]  Marc M. Van Hulle,et al.  Faithful Representations and Topographic Maps: From Distortion- to Information-Based Self-Organization , 2000 .

[51]  Robert Jenssen,et al.  The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space , 2004, NIPS.

[52]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[53]  Thomas Villmann,et al.  Magnification Control in Self-Organizing Maps and Neural Gas , 2006, Neural Computation.

[54]  Stanley C. Ahalt,et al.  Competitive learning algorithms for vector quantization , 1990, Neural Networks.

[55]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[56]  Robert Jenssen,et al.  Some Equivalences between Kernel Methods and Information Theoretic Methods , 2006, J. VLSI Signal Process..

[57]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[58]  Thomas Villmann,et al.  Derivatives of Pearson Correlation for Gradient-based Analysis of Biomedical Data , 2008, Inteligencia Artif..

[59]  José Carlos Príncipe,et al.  A Test of Granger Non-causality Based on Nonparametric Conditional Independence , 2010, 2010 20th International Conference on Pattern Recognition.

[60]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[61]  Deniz Erdoğmuş,et al.  Vector-quantization by density matching in the minimum Kullback-Leibler divergence sense , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[62]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[63]  Thomas Villmann,et al.  Sparse Functional Relevance Learning in Generalized Learning Vector Quantization , 2011, WSOM.