MIWOCI 2010, Mittweida Workshop on Computational Intelligence

We introduce a modification of batch gradient descent, which aims at better convergence properties and more robust minimization. In the course of the descent, the procedure compares the performance of the actual configuration with that of a gliding average over the most recent positions. If the latter corresponds to a lower value of the optimization objective, minimization proceeds from there and the step size of the descent is decreased. Here we present the prescription from a practitioner’s point of view and refrain from a detailed mathematical analysis. First, the method is illustrated in terms of a low dimensional example. Moreover, we discuss its application in the context of machine learning, examples corresponding to multilayered neural networks and a recent extension of Learning Vector Quantization (LVQ) termed Matrix Relevance LVQ.

[1]  I. Jolliffe Principal Component Analysis , 2002 .

[2]  Yuko Araki,et al.  Multivariate Regression Modeling for Functional Data , 2008 .

[3]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[4]  Trinad Chakraborty,et al.  Rapid Identification and Typing of Listeria Species by Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry , 2008, Applied and Environmental Microbiology.

[5]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[6]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[7]  Michael Biehl,et al.  Distance Learning in Discriminative Vector Quantization , 2009, Neural Computation.

[8]  S. P. Luttrell,et al.  A Bayesian Analysis of Self-Organizing Maps , 1994, Neural Computation.

[9]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[10]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[11]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[12]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[13]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[14]  Barbara Hammer,et al.  Learning with recurrent neural networks , 2000 .

[15]  A. Rényi On Measures of Entropy and Information , 1961 .

[16]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[17]  Thomas Villmann,et al.  Neural networks and machine learning in bioinformatics - theory and applications , 2006, ESANN.

[18]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[19]  James C. Bezdek,et al.  Numerical comparison of the RFCM and AP algorithms for clustering relational data , 1991 .

[20]  Thomas Villmann,et al.  Regularization in Matrix Relevance Learning , 2010, IEEE Transactions on Neural Networks.

[21]  Horst Bunke,et al.  Edit distance-based kernel functions for structural pattern classification , 2006, Pattern Recognit..

[22]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[23]  H. Poilvé,et al.  Hyperspectral Imaging and Stress Mapping in Agriculture , 1998 .

[24]  Joachim M. Buhmann,et al.  An Annealed "Neural Gas" Network for Robust Vector Quantization , 1996, ICANN.

[25]  Hiroshi Okamoto,et al.  Plant classification for weed detection using hyperspectral imaging with wavelet analysis , 2007 .

[26]  Eyke Hüllermeier,et al.  A Fuzzy Variant of the Rand Index for Comparing Clustering Structures , 2009, IFSA/EUSFLAT Conf..

[27]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[28]  J. Woolley Reflectance and transmittance of light by leaves. , 1971, Plant physiology.

[29]  M. Ngadi,et al.  Hyperspectral imaging for nondestructive determination of some quality attributes for strawberry , 2007 .

[30]  Hidetomo Ichihashi,et al.  Application of Kernel Trick to Fuzzy c-Means with Regularization by K-L Information , 2004, J. Adv. Comput. Intell. Intell. Informatics.

[31]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[32]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[33]  James C. Bezdek,et al.  A note on self-organizing semantic maps , 1995, IEEE Trans. Neural Networks.

[34]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[35]  Marc Strickert,et al.  Adaptive matrix distances aiming at optimum regression subspaces , 2010, ESANN.

[36]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[37]  James C. Bezdek,et al.  Two soft relatives of learning vector quantization , 1995, Neural Networks.

[38]  Klaus Obermayer,et al.  Self-organizing maps: Generalizations and new optimization techniques , 1998, Neurocomputing.

[39]  R. Severson,et al.  Quantitation of the major cuticular components from green leaf of different tobacco types , 1984 .

[40]  Makoto Yasuda,et al.  Fuzzy Entropy Based Fuzzy c-Means Clustering with Deterministic and Simulated Annealing Methods , 2009, IEICE Trans. Inf. Syst..

[41]  James C. Bezdek,et al.  Generalized clustering networks and Kohonen's self-organizing scheme , 1993, IEEE Trans. Neural Networks.

[42]  James C. Bezdek,et al.  Fuzzy Kohonen clustering networks , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.

[43]  Thomas Villmann,et al.  Divergence Based Online Learning in Vector Quantization , 2010, ICAISC.

[44]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[45]  Heesung Kwon,et al.  Unsupervised segmentation algorithm based on an iterative spectral dissimilarity measure for hyperspectral imagery , 2000, IS&T/SPIE Electronic Imaging.

[46]  Sadaaki Miyamoto,et al.  Fuzzy c-Means Algorithms Using Kullback-Leibler Divergence and Helliger Distance Based on Multinomial Manifold , 2008, J. Adv. Comput. Intell. Intell. Informatics.

[47]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[48]  Chein-I. Chang,et al.  New Hyperspectral Discrimination Measure for Spectral Characterization , 2004 .

[49]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[50]  Thomas Villmann,et al.  Extending FSNPC to handle data points with fuzzy class assignments , 2010, ESANN.

[51]  Thomas Martinetz,et al.  Sparse Coding Neural Gas: Learning of overcomplete data representations , 2009, Neurocomputing.

[52]  Thomas Villmann,et al.  Median Variant of Fuzzy c-Means , 2009, ESANN.

[53]  James C. Bezdek,et al.  Fuzzy Kohonen clustering networks , 1994, Pattern Recognit..

[54]  Bart Kosko,et al.  Fuzzy entropy and conditioning , 1986, Inf. Sci..

[55]  Thomas Villmann,et al.  Median fuzzy c-means for clustering dissimilarity data , 2010, Neurocomputing.

[56]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[57]  Thomas Villmann,et al.  Neural maps in remote sensing image analysis , 2003, Neural Networks.

[58]  Mohammad Ghorbani,et al.  Maximum Entropy-Based Fuzzy Clustering by Using L1-norm Space , 2005 .

[59]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[60]  Jean-Marc Constans,et al.  Fuzzy kappa for the agreement measure of fuzzy classifications , 2007, Neurocomputing.

[61]  Klaus Obermayer,et al.  Dynamic Hyperparameter Scaling Method for LVQ Algorithms , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[62]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[63]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[64]  Thomas Villmann,et al.  Fuzzy Fleiss-kappa for Comparison of Fuzzy Classifiers , 2009, ESANN.

[65]  T. Maier,et al.  Fast and reliable MALDI-TOF MS–based microorganism identification , 2006 .

[66]  Dong-Chul Park,et al.  Content-Based Classification of Images Using Centroid Neural Network with Divergence Measure , 2006, Australian Conference on Artificial Intelligence.

[67]  Thomas Villmann,et al.  Some Theoretical Aspects of the Neural Gas Vector Quantizer , 2009, Similarity-Based Clustering.

[68]  Thomas Villmann,et al.  Divergence-Based Vector Quantization , 2011, Neural Computation.

[69]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[70]  Thomas Villmann,et al.  Fuzzy classification by fuzzy labeled neural gas , 2006, Neural Networks.

[71]  Barbara Hammer,et al.  Topographic Mapping of Large Dissimilarity Data Sets , 2010, Neural Computation.

[72]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[73]  Marc Strickert,et al.  Sanger-driven MDSLocalize - a comparative study for genomic data , 2006, ESANN.

[74]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[75]  Claus Bahlmann,et al.  Learning with Distance Substitution Kernels , 2004, DAGM-Symposium.

[76]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[77]  K. Obermayer,et al.  PHASE TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS , 1997 .

[78]  Thomas Villmann,et al.  Machine learning approches and pattern recognition for spectral data , 2008, ESANN.

[79]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[80]  Klaus Obermayer,et al.  Soft Learning Vector Quantization , 2003, Neural Computation.

[81]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[82]  E. Granum,et al.  Quantitative analysis of 6985 digitized trypsin G ‐banded human metaphase chromosomes , 1980, Clinical genetics.