Topographic mappings and feed-forward neural networks

This thesis is a study of the generation of topographic mappings - dimension reducing transformations of data that preserve some element of geometric structure - with feed-forward neural networks. As an alternative to established methods, a transformational variant of Sammon's method is proposed, where the projection is effected by a radial basis function neural network. This approach is related to the statistical field of multidimensional scaling, and from that the concept of a 'subjective metric' is defined, which permits the exploitation of additional prior knowledge concerning the data in the mapping process. This then enables the generation of more appropriate feature spaces for the purposes of enhanced visualisation or subsequent classification. A comparison with established methods for feature extraction is given for data taken from the 1992 Research Assessment Exercise for higher educational institutions in the United Kingdom. This is a difficult high-dimensional dataset, and illustrates well the benefit of the new topographic technique. A generalisation of the proposed model is considered for implementation of the classical multidimensional scaling (¸mds}) routine. This is related to Oja's principal subspace neural network, whose learning rule is shown to descend the error surface of the proposed ¸mds model. Some of the technical issues concerning the design and training of topographic neural networks are investigated. It is shown that neural network models can be less sensitive to entrapment in the sub-optimal global minima that badly affect the standard Sammon algorithm, and tend to exhibit good generalisation as a result of implicit weight decay in the training process. It is further argued that for ideal structure retention, the network transformation should be perfectly smooth for all inter-data directions in input space. Finally, there is a critique of optimisation techniques for topographic mappings, and a new training algorithm is proposed. A convergence proof is given, and the method is shown to produce lower-error mappings more rapidly than previous algorithms.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .

[3]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[4]  G. Ekman Dimensions of Color Vision , 1954 .

[5]  E. Rothkopf A measure of stimulus similarity and errors in some paired-associate learning tasks. , 1957, Journal of experimental psychology.

[6]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[7]  R. Shepard Stimulus and response generalization: deduction of the generalization gradient from a trace model. , 1958, Psychological review.

[8]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[9]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[10]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[11]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[12]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[13]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[14]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[17]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[18]  Richard W. Hamming,et al.  Introduction to Applied Numerical Analysis. , 1971 .

[19]  Joseph B. Kruskal Comments on "A Nonlinear Mapping for Data Structure Analysis" , 1971, IEEE Trans. Computers.

[20]  Keinosuke Fukunaga,et al.  A Nonlinear Feature Extraction Algorithm Using Distance Transformation , 1972, IEEE Transactions on Computers.

[21]  Richard C. T. Lee,et al.  A Heuristic Relaxation Method for Nonlinear Mapping in Cluster Analysis , 1973, IEEE Trans. Syst. Man Cybern..

[22]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[23]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[24]  C. Malsburg,et al.  How patterned neural connections can be set up by self-organization , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[25]  Richard C. T. Lee,et al.  A Triangulation Method for the Sequential Mapping of Points from N-Space to Two-Space , 1977, IEEE Transactions on Computers.

[26]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[27]  C. E. Pykett Improving the efficiency of Sammon's nonlinear mapping by using clustering archetypes , 1978 .

[28]  K. Mardia Some properties of clasical multi-dimesional scaling , 1978 .

[29]  Heinrich Niemann,et al.  A Fast-Converging Algorithm for Nonlinear Mapping of High-Dimensional Data to a Plane , 1979, IEEE Transactions on Computers.

[30]  N. Suga,et al.  Neural axis representing target range in the auditory cortex of the mustache bat. , 1979, Science.

[31]  J. Kaas,et al.  Multiple representations of the body within the primary somatosensory cortex of primates. , 1979, Science.

[32]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[33]  S. Schiffman Introduction to Multidimensional Scaling , 1981 .

[34]  Gautam Biswas,et al.  Evaluation of Projection Algorithms , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Forrest W. Young,et al.  Introduction to Multidimensional Scaling: Theory, Methods, and Applications , 1981 .

[36]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[37]  Riichiro Mizoguchi,et al.  A new algorithm for non-linear mapping with applications to dimension and cluster analyses , 1983, Pattern Recognit..

[38]  George Henry Dunteman,et al.  Introduction To Multivariate Analysis , 1984 .

[39]  E. Aronson,et al.  Theory and method , 1985 .

[40]  G. Blasdel,et al.  Voltage-sensitive dyes reveal a modular organization in monkey striate cortex , 1986, Nature.

[41]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[42]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[43]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[44]  J. Leeuw Convergence of the majorization method for multidimensional scaling , 1988 .

[45]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[46]  Jack Sklansky,et al.  An overview of mapping techniques for exploratory pattern analysis , 1988, Pattern Recognit..

[47]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[48]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[49]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[50]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[51]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[52]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[53]  Eric Saund,et al.  Dimensionality-Reduction Using Connectionist Networks , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Richard C. Dubes,et al.  Experiments in projection and clustering by simulated annealing , 1989, Pattern Recognit..

[55]  Eric L. Schwartz,et al.  A Numerical Solution to the Generalized Mapmaker's Problem: Flattening Nonconvex Polyhedral Surfaces , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Emile H. L. Aarts,et al.  Simulated annealing and Boltzmann machines - a stochastic approach to combinatorial optimization and neural computing , 1990, Wiley-Interscience series in discrete mathematics and optimization.

[57]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[58]  D. Lowe,et al.  Exploiting prior knowledge in network optimization: an illustration from medical prognosis , 1990 .

[59]  M. J. Norušis,et al.  SPSS base system user's guide , 1990 .

[60]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[61]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[62]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[63]  Helge Ritter Asymptotic level density for a class of vector quantization processes , 1991, IEEE Trans. Neural Networks.

[64]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[65]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[66]  W. Heiser A generalized majorization method for least souares multidimensional scaling of pseudodistances that may be negative , 1991 .

[67]  Klaus Pawelzik,et al.  Quantifying the neighborhood preservation of self-organizing feature maps , 1992, IEEE Trans. Neural Networks.

[68]  Anil K. Jain,et al.  Artificial neural network for nonlinear projection of multivariate data , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[69]  Malcolm P. Young,et al.  Objective analysis of the topological organization of the primate cortical visual system , 1992, Nature.

[70]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[71]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[72]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[73]  R. Mathar,et al.  On global optimization in two-dimensional scaling , 1993 .

[74]  Christopher M. Bishop,et al.  Curvature-driven smoothing: a learning algorithm for feedforward networks , 1993, IEEE Trans. Neural Networks.

[75]  C Ura NON-LINEAR MAPPING FOR STRUCTURE-ACTIVITY AND STRUCTURE-PROPERTY MODELLING , 1993 .

[76]  Lei Xu,et al.  Least mean square error reconstruction principle for self-organizing neural-nets , 1993, Neural Networks.

[77]  J. Devillers,et al.  Non‐linear mapping for structure‐activity and structure‐property modelling , 1993 .

[78]  D. Lowe Novel 'topographic' nonlinear feature extraction using radial basis functions for concentration coding in the 'artificial nose' , 1993 .

[79]  M. Trosset,et al.  An optimization problem on subsets of the symmetric positive-semidefinite matrices , 1993 .

[80]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[81]  J. Johnes,et al.  The Research Performance of UK Universities: a Statistical Analysis of the Results of the 1989 Research Selectivity Exercise , 1993 .

[82]  Trevor F. Cox,et al.  Discriminant analysis using non-metric multidimensional scaling , 1993, Pattern Recognit..

[83]  Jim Taylor Measuring Research Performance in Business and Management Studies in the United Kingdom: The 1992 Research Assessment Exercise* , 1994 .

[84]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[85]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[86]  R. Shanmugam Multivariate Analysis: Part 1: Distributions, Ordination and Inference , 1994 .

[87]  John B. Moore,et al.  Global analysis of Oja's flow for neural networks , 1994, IEEE Trans. Neural Networks.

[88]  Witold Dzwinel,et al.  How to make sammon's mapping useful for multidimensional data structures analysis , 1994, Pattern Recognit..

[89]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[90]  Varghese S. Jacob,et al.  A study of the classification capabilities of neural networks using unsupervised learning: A comparison withK-means clustering , 1994 .

[91]  Joachim M. Buhmann,et al.  Multidimensional Scaling and Data Clustering , 1994, NIPS.

[92]  Jerome H. Friedman,et al.  An Overview of Predictive Learning and Function Approximation , 1994 .

[93]  Richard P. Lippmann,et al.  Neural Networks, Bayesian a posteriori Probabilities, and Pattern Classification , 1994 .

[94]  Dong Dong,et al.  Nonlinear principal component analysis-based on principal curves and neural networks , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[95]  S. Levine,et al.  Uniquely Representing Point Patterns with Minimal Information , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[96]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[97]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[98]  Juha Karhunen,et al.  Stability of Oja's PCA Subspace Rule , 1994, Neural Computation.

[99]  Andrew R. Webb,et al.  Multidimensional scaling by iterative majorization using radial basis functions , 1995, Pattern Recognit..

[100]  Jim Taylor A statistical analysis of the 1992 research assessment exercise , 1995 .

[101]  James C. Bezdek,et al.  An index of topological preservation for feature extraction , 1995, Pattern Recognit..

[102]  John L. Wyatt,et al.  Time-Domain Solutions of Oja's Equations , 1995, Neural Computation.

[103]  Mark D. Plumbley Lyapunov functions for convergence of principal component algorithms , 1995, Neural Networks.

[104]  D J Willshaw,et al.  An evaluation of the use of multidimensional scaling for understanding brain connectivity. , 1995, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[105]  Kurt Hornik,et al.  Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[106]  Michael E. Tipping,et al.  A novel neural network technique for exploratory data analysis , 1995 .

[107]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[108]  Christopher M. Bishop,et al.  GTM: A Principled Alternative to the Self-Organizing Map , 1996, NIPS.

[109]  Jorma Laaksonen,et al.  SOM_PAK: The Self-Organizing Map Program Package , 1996 .

[110]  T. Sejnowski,et al.  Quantifying neighbourhood preservation in topographic mappings , 1996 .

[111]  P R Limb,et al.  Visualization Techniques for Data Mining , 1996 .

[112]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.