Cohort-based kernel visualisation with scatter matrices

Visualisation with good discrimination between data cohorts is important for exploratory data analysis and for decision support interfaces. This paper proposes a kernel extension of the cluster-based linear visualisation method described in Lisboa et al. [15]. A representation of the data in dual form permits the application of the kernel trick, so projecting the data onto the orthonormalised cohort means in the feature space. The only parameters of the method are those for the kernel function. The method is shown to obtain well-discriminating visualisations of non-linearly separable data with low computational cost. The linearity of the visualisation was tested using nearest neighbour and linear discriminant classifiers, achieving significant improvements in classification accuracy with respect to the original features, especially for high-dimensional data, where 93% accuracy was obtained for the Splice-junction Gene Sequences data set from the UCI repository.

[1]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[2]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[3]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[4]  Jianguo Wang,et al.  Kernel maximum scatter difference based feature extraction and its application to face recognition , 2008, Pattern Recognit. Lett..

[5]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[6]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[7]  Nathan Intrator,et al.  Boosted Mixture of Experts: An Ensemble Learning Scheme , 1999, Neural Computation.

[8]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[9]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[10]  Paulo J. G. Lisboa,et al.  Cluster-based visualisation with scatter matrices , 2008, Pattern Recognit. Lett..

[11]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[12]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[13]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[14]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[15]  Lei Wang,et al.  A Kernel-Induced Space Selection Approach to Model Selection in KLDA , 2008, IEEE Transactions on Neural Networks.

[16]  Michael E. Tipping,et al.  Feed-forward neural networks and topographic mappings for exploratory data analysis , 1996, Neural Computing & Applications.

[17]  Elzbieta Pekalska,et al.  Kernel Discriminant Analysis for Positive Definite and Indefinite Kernels , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[19]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[20]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[21]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[22]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[23]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[24]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[25]  Jian Yang,et al.  Essence of kernel Fisher discriminant: KPCA plus LDA , 2004, Pattern Recognit..

[26]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[27]  K. Gabriel,et al.  The biplot graphic display of matrices with application to principal component analysis , 1971 .

[28]  Keinosuke Fukunaga,et al.  A Nonlinear Feature Extraction Algorithm Using Distance Transformation , 1972, IEEE Transactions on Computers.

[29]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.