High-Dimensionality Graph Data Reduction Based on a Proposed New Algorithm

In recent years, graph data analysis has become very important in modeling data distribution or structure in many applications, for example, social science, astronomy, computational biology or social networks with a massive number of nodes and edges. However, high-dimensionality of the graph data remains a difficult task, mainly because the analysis system is not used to dealing with large graph data. Therefore, graph-based dimensionality reduction approaches have been widely used in many machine learning and pattern recognition applications. This paper offers a novel dimensionality reduction approach based on the recent graph data. In particular, we focus on combining two linear methods: Neighborhood Preserving Embedding (NPE) method with the aim of preserving the local neighborhood information of a given dataset, and Principal Component Analysis (PCA) method with aims of maximizing the mutual information between the original high-dimensional data sets. The combination of NPE and PCA contributes to proposing a new Hybrid dimensionality reduction technique (HDR). We propose HDR to create a transformation matrix, based on formulating a generalized eigenvalue problem and solving it with Rayleigh Quotient solution. Consequently, therefore, a massive reduction is achieved compared to the use of PCA and NPE separately. We compared the results with the conventional PCA, NPE, and other linear dimension reduction methods. The proposed method HDR was found to perform better than other techniques. Experimental results have been based on two real datasets.

[1]  Alberto D. Pascual-Montano,et al.  A survey of dimensionality reduction techniques , 2014, ArXiv.

[2]  Xudong Jiang,et al.  Asymmetric Principal Component and Discriminant Analyses for Pattern Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Sherif Sakr,et al.  Large scale graph processing systems: survey and an experimental evaluation , 2015, Cluster Computing.

[4]  Sherif Sakr,et al.  On Characterizing the Performance of Distributed Graph Computation Platforms , 2014, TPCTC.

[5]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[6]  Sherif Sakr,et al.  Graph Data Management: Techniques and Applications , 2011, Graph Data Management.

[7]  Dimitrios Gunopulos,et al.  Non-linear dimensionality reduction techniques for classification and visualization , 2002, KDD.

[8]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[9]  Silong Peng,et al.  A New Method Combining LDA and PLS for Dimension Reduction , 2014, PloS one.

[10]  Hairong Qi,et al.  Hybrid Dimensionality Reduction Method Based on Support Vector Machine and Independent Component Analysis , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[12]  Mykola Pechenizkiy,et al.  Eigenvector-Based Feature Extraction for Classification , 2002, FLAIRS.

[13]  Ping Guo,et al.  Combining LPP with PCA for microarray data clustering , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[14]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Witold Pedrycz,et al.  Face Recognition Using an Enhanced Independent Component Analysis Approach , 2007, IEEE Transactions on Neural Networks.

[17]  Mykola Pechenizkiy,et al.  On combining principal components with Fisher's linear discriminants for supervised learning , 2006 .