Information-theoretic enhancement learning and its application to visualization of self-organizing maps

In this paper, we propose a new information-theoretic method called ''enhancement learning'' to interpret the configuration of competitive networks. When applied to self-organizing maps, the method aims to make clusters of data easier to see at different detail levels. In enhancement learning, connection weights are actively modified to enhance competitive units for better interpretation, at the expense of quantization errors in the extreme case, because error minimization is not the main target of enhancement learning. After modifying connection weights, enhancement learning can generate as many network configurations as possible just by our changing the enhancement parameter. A useful way to combine the information from the several network configurations is to extract features common to all configurations and specific to some configurations. In addition, we propose relative information, namely, mutual information that takes into consideration the corresponding errors between input patterns and connection weights. The relative information provides a guideline by which we can pay much attention to a particular network configuration among many possibilities. We applied the method to an artificial data problem, the well-known Iris problem, Haberman data and a cancer data problem. In all the problems, experimental results confirmed that, as the enhancement parameter is increased, multiple configurations are generated, in which the number of boundaries in terms of U-matrices and component planes could be increased. In addition, we could see that relative information was effective in suggesting a possibility to detect the appropriate number of clusters.

[1]  Alessio Micheli,et al.  Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines , 2001, J. Chem. Inf. Comput. Sci..

[2]  Michael C. Mozer,et al.  Template-based procedures for neural network interpretation , 1999, Neural Networks.

[3]  Raphaël Féraud,et al.  A methodology to explain neural network classification , 2002, Neural Networks.

[4]  Masumi Ishikawa,et al.  Structural learning with forgetting , 1996, Neural Networks.

[5]  Ryotaro Kamimura,et al.  Mutual information maximization by free energy-based competitive learning for self-organizing maps , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[8]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[9]  Ryotaro Kamimura,et al.  Information-Theoretic Competitive Learning with Inverse Euclidean Distance Output Units , 2003, Neural Processing Letters.

[10]  Naonori Ueda,et al.  Deterministic Annealing Variant of the EM Algorithm , 1994, NIPS.

[11]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[12]  Duane DeSieno,et al.  Adding a conscience to competitive learning , 1988, IEEE 1988 International Conference on Neural Networks.

[13]  Thomas Villmann,et al.  Explicit Magnification Control of Self-Organizing Maps for “Forbidden” Data , 2007, IEEE Transactions on Neural Networks.

[14]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[15]  Stanley C. Ahalt,et al.  Competitive learning algorithms for vector quantization , 1990, Neural Networks.

[16]  Ralph Linsker,et al.  Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network , 1992, Neural Computation.

[17]  Bernd Fritzke,et al.  Unsupervised clustering with growing cell structures , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[18]  Antonio Artés-Rodríguez,et al.  Maximization of Mutual Information for Supervised Linear Feature Extraction , 2007, IEEE Transactions on Neural Networks.

[19]  Erzsébet Merényi,et al.  Forbidden magnification? II , 2004, ESANN.

[20]  Thomas Villmann,et al.  Magnification Control in Self-Organizing Maps and Neural Gas , 2006, Neural Computation.

[21]  John W. Fisher,et al.  Learning from Examples with Information Theoretic Criteria , 2000, J. VLSI Signal Process..

[22]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[23]  Marc M. Van Hulle,et al.  The Formation of Topographic Maps That Maximize the Average Mutual Information of the Output Responses to Noiseless Input Signals , 1997, Neural Computation.

[24]  Ryotaro Kamimura,et al.  Information loss to extract distinctive features in competitive learning , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[25]  Marc M. Van Hulle,et al.  Faithful representations with topographic maps , 1999, Neural Networks.

[26]  M. Omair Ahmad,et al.  Branching competitive learning Network:A novel self-creating model , 2004, IEEE Transactions on Neural Networks.

[27]  Thomas Villmann,et al.  Some Theoretical Aspects of the Neural Gas Vector Quantizer , 2009, Similarity-Based Clustering.

[28]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[29]  Ralph Linsker,et al.  Improved local learning rule for information maximization and related applications , 2005, Neural Networks.

[30]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[31]  Ryotaro Kamimura,et al.  Teacher-directed learning: information-theoretic competitive learning in supervised multi-layered networks , 2003, Connect. Sci..

[32]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[33]  Esa Alhoniemi,et al.  SOM Toolbox for Matlab 5 , 2000 .

[34]  Ryotaro Kamimura,et al.  Greedy information acquisition algorithm: A new information theoretic approach to dynamic information acquisition in neural networks , 2002, Connect. Sci..

[35]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[36]  Fred Henrik Hamker,et al.  Life-long learning Cell Structures--continuously learning without catastrophic interference , 2001, Neural Networks.

[37]  Masumi Ishikawa,et al.  Rule extraction by successive regularization , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[38]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[39]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[40]  Yeuvo Jphonen,et al.  Self-Organizing Maps , 1995 .

[41]  Ralf Der,et al.  Controlling the Magnification Factor of Self-Organizing Feature Maps , 1996, Neural Computation.

[42]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[43]  B Fritzke,et al.  A growing neural gas network learns topologies. G. Tesauro, DS Touretzky, and TK Leen, editors , 1995, NIPS 1995.

[44]  Ryotaro Kamimura,et al.  Flexible feature discovery and structural information control , 2001, Connect. Sci..

[45]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[46]  Ryotaro Kamimura Interpreting and improving multi-layered networks by free energy-based competitive learning , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[47]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[48]  B. Fritzke,et al.  A growing and splitting elastic network for vector quantization , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[49]  Jude W. Shavlik,et al.  Extracting Refined Rules from Knowledge-Based Neural Networks , 1993, Machine Learning.

[50]  Ryotaro Kamimura,et al.  Information theoretic competitive learning in self-adaptive multi-layered networks , 2003, Connect. Sci..

[51]  Ryotaro Kamimura,et al.  Feature detection and information loss in competitive learning , 2008 .

[52]  M. Omair Ahmad,et al.  Competitive splitting for codebook initialization , 2004, IEEE Signal Processing Letters.

[53]  K. Obermayer,et al.  PHASE TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS , 1997 .

[54]  Zoran Nenadic,et al.  Information Discriminant Analysis: Feature Extraction with an Information-Theoretic Objective , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Ryotaro Kamimura,et al.  Feature Discovery by Enhancement and Relaxation of Competitive Units , 2008, IDEAL.

[56]  Ryotaro Kamimura,et al.  Free energy-based competitive learning for self-organizing maps , 2008 .

[57]  Samuel Kaski,et al.  Methods for interpreting a self-organized map in data analysis , 1998, ESANN.

[58]  Ryotaro Kamimura,et al.  Information Theoretic Competitive Learning and Linguistic Rule Acquisition , 2001 .

[59]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[60]  Tadashi Horiuchi,et al.  A Study on Two-stage Self-Organizing Map and Its Application to Clustering Problems , 2005 .

[61]  Zhi-Qiang Liu,et al.  Self-splitting competitive learning: a new on-line clustering paradigm , 2002, IEEE Trans. Neural Networks.

[62]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[63]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[64]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[65]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[66]  Lars I. Nord,et al.  A novel method for examination of the variable contribution to computational neural network models , 1998 .

[67]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[68]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[69]  Marc M. Van Hulle Topographic map formation by maximizing unconditional entropy: a plausible strategy for "online" unsupervised competitive learning and nonparametric density estimation , 1996, IEEE Trans. Neural Networks.

[70]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[71]  Bernd Fritzke,et al.  Growing self-organizing networks - Why ? , 1996, ESANN.

[72]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[73]  A. Hardy On the number of clusters , 1996 .

[74]  Andrew Luk,et al.  Dynamics of the generalised lotto-type competitive learning , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[75]  Deniz Erdogmus,et al.  Vector quantization using information theoretic concepts , 2005, Natural Computing.

[76]  Ryotaro Kamimura,et al.  Progressive Feature Extraction with a Greedy Network-growing Algorithm , 2003, Complex Syst..

[77]  Tom Heskes,et al.  Self-organizing maps, vector quantization, and mixture modeling , 2001, IEEE Trans. Neural Networks.