Kernel-Based Manifold-Oriented Stochastic Neighbor Projection Method

A new method for performing a nonlinear form of manifold-oriented stochastic neighbor projection method is proposed. By the use of kernel functions, one can operate in the feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. The proposed method is termed as kernel-based manifoldoriented stochastic neighbor projection(KMSNP). By two different strategies, KMSNP is divided into two methods: KMSNP1 and KMSNP2. Experimental results on several databases show that, compared with the relevant methods, the proposed methods obtain higher classification performance and recognition rate. INTRODUCTION Kernel-based methods(kernel methods for short) have become a new hot topic in machine learning fields in recent years, their theoretical basis is statistical learning theory. Kernel methods are a class of algorithms for pattern analysis, whose best known element is the support vector machine(SVM) (Dardas and Georganas 2011).The methods skillfully introduce kernel function which not only reduces the curse of dimensionality (Cherchi and Guevara 2012, Xue et al. 2012), but also effectively solves the local minimum and incomplete statistical analysis in traditional pattern recognition methods on the premise of no additional computational capacity. As an availability way to resolve the problem of nonlinear pattern recognition, kernel methods approach the problem by mapping the data into a highdimensional feature space, where each coordinate corresponds to one feature of the data items, transforming the data into a set of points in a Euclidean space (Chen and Li 2011, Zhang et al. 2008). The theory of kernel methods can be traced back to 1909, Mercer proposed Mercer's theorem (Mercer 1909) which indicates that any ‘reasonable’ kernel function corresponds to some feature space. 1964, the use of Mercer's theorem for interpreting kernels as inner products in a feature space was introduced into machine learning by Aizerman et al. (AizermanI et al. 1964), but no sufficient importance has been attached to it. Until 1992, Vapnik et al. (Boser et al. 1992) successfully extended the SVM to the non-linear SVM by using kernel functions, it began to show its potential and advantages. Subsequently, more and more kernelbased methods were presented, such as: kernel principal component analysis(KPCA) (Xiao et al. 2012), kernel fisher discriminator(KFD) (Yang et al. 2005), kernel independent component analysis (KICA) (Zhang et al. 2013), kernel partial least squares(KPLS) (Helander et al. 2012) and so on. In this paper, we propose to use the kernel idea and present a method called kernel-based manifoldoriented stochastic neighbor projection(KMSNP) method through improving the manifold-oriented stochastic neighbor projection(MSNP) (Wu et al. 2011) technique. MSNP is based on stochastic neighbor embedding(SNE) (Hinton and Roweis 2002) and t-SNE (Maaten and Hinton 2008). The basic principle of SNE is to convert pairwise Euclidean distances into probabilities of selecting neighbors to model pairwise similarities while t-SNE uses student t-distribution to model pairwise dissimilarities in low-dimensional space. Different from SNE and t-SNE, MSNP converts pairwise dissimilarities of inputs to probability distribution related to geodesic distance in highdimensional space and uses Cauchy distribution to model stochastic distribution of features. Furthermore, it recovers the manifold structure through a linear projection by requiring the two distributions to be similar. Experiments demonstrate MSNP has unique advantages in terms of visualization and recognition task, but there are still two drawbacks in it: firstly, MSNP is an unsupervised method and lack of the idea of class label, so it is not suitable for pattern identification; secondly, since MSNP is a linear feature dimensionality reduction algorithm, it cannot effectively settle the nonlinear feature extraction problem. To overcome the disadvantages of MSNP, we have done some preliminary work. On the first, we introduced the idea of class label and presented a method called discriminative stochastic neighbor embedding analysis(DSNE) (Zheng et al. 2012, Chen Proceedings 27th European Conference on Modelling and Simulation ©ECMS Webjorn Rekdalsbakken, Robin T. Bye, Houxiang Zhang (Editors) ISBN: 978-0-9564944-6-7 / ISBN: 978-0-9564944-7-4 (CD) and Wang 2012). On the second, we think KMSNP can overcome the disadvantage mentioned above well. The rest of this paper is organized as follows: in Section 2, we provide a brief review of MSNP. Section 3 describes the detailed algorithm derivation of KMSNP. Furthermore, experiments on various databases are presented in Section 4. Finally, we provide some concluding remarks and describe several issues for future works in Section 5. MSNP Considering the problem of representing d-dimensional data vectors x1, x2, . . . , xN, by r-dimensional (r << d) vectors y1, y2, . . ., yN such that yi represents xi. The basic principle of MSNP is to convert pairwise dissimilarity of inputs to probability distribution related to geodesic distance in high-dimensional space, and then using Cauchy distribution to model stochastic distribution of features, finally, MSNP recovers the manifold structure through a linear projection by requiring the two distributions to be similar. Mathematically, the similarity of datapoint xi to datapoint xj is depicted as the following joint probability pij which means xi how possible to pick xj as its neighbor: exp( / 2) exp( / 2) geo ij ij geo ik k i D p D      (1) where Dij is the geodesic distance for xi and xj. In practice, MSNP calculates geodesic distance by using a two-phase method (Wu et al. 2011). Firstly, an adjacency graph G is constructed by K-nearest neighbor strategy. Secondly, the desired geodesic distance is approximated by the shortest path of graph G. This procedure is proposed in Isomap to estimate geodesic distance and the detail calculation steps can be found in (Tenenbaum et al. 2000). For low-dimensional representations, MSNP employs Cauchy distribution with  degree of freedom to construct joint probability qij. The probability qij indicates how possible point i and point j can be stochastic neighbors is defined as:

[1]  Huangang Wang,et al.  L1 norm based KPCA for novelty detection , 2013, Pattern Recognit..

[2]  Shengyong Chen,et al.  Normalized weighted shape context and its application in feature-based matching , 2008 .

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Cristian Angelo Guevara,et al.  A Monte Carlo experiment to analyze the curse of dimensionality in estimating random coefficients models with a full variance-covariance matrix , 2012 .

[5]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[6]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[7]  Shengyong Chen,et al.  Bound Maxima as a Traffic Feature under DDOS Flood Attacks , 2012 .

[8]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[9]  Moncef Gabbouj,et al.  Voice Conversion Using Dynamic Kernel Partial Least Squares Regression , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Nicolas D. Georganas,et al.  Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques , 2011, IEEE Transactions on Instrumentation and Measurement.

[11]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[14]  Shengyong Chen,et al.  Acceleration Strategies in Generalized Belief Propagation , 2012, IEEE Transactions on Industrial Informatics.

[15]  Songsong Wu,et al.  Stochastic neighbor projection on manifold for feature extraction , 2011, Neurocomputing.

[16]  Y F Li,et al.  Determination of Stripe Edge Blurring for Depth Sensing , 2011, IEEE Sensors Journal.