Similarity-balanced discriminant neighbor embedding and its application to cancer classification based on gene expression data

The family of discriminant neighborhood embedding (DNE) methods is typical graph-based methods for dimension reduction, and has been successfully applied to face recognition. This paper proposes a new variant of DNE, called similarity-balanced discriminant neighborhood embedding (SBDNE) and applies it to cancer classification using gene expression data. By introducing a novel similarity function, SBDNE deals with two data points in the same class and the different classes with different ways. The homogeneous and heterogeneous neighbors are selected according to the new similarity function instead of the Euclidean distance. SBDNE constructs two adjacent graphs, or between-class adjacent graph and within-class adjacent graph, using the new similarity function. According to these two adjacent graphs, we can generate the local between-class scatter and the local within-class scatter, respectively. Thus, SBDNE can maximize the between-class scatter and simultaneously minimize the within-class scatter to find the optimal projection matrix. Experimental results on six microarray datasets show that SBDNE is a promising method for cancer classification.

[1]  Rongfang Bie,et al.  Discriminant Analysis Methods for Microarray Data Classification , 2008, Australasian Conference on Artificial Intelligence.

[2]  Weiguo Gong,et al.  Null space discriminant locality preserving projections for face recognition , 2008, Neurocomputing.

[3]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[4]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[5]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[7]  Hau-San Wong,et al.  A neural network-based biomarker association information extraction approach for cancer classification , 2009, J. Biomed. Informatics.

[8]  Jingjing Liu,et al.  Cancer classification based on microarray gene expression data using a principal component accumulation method , 2011 .

[9]  Jianping Gou,et al.  Locality-Based Discriminant Neighborhood Embedding , 2013, Comput. J..

[10]  Xiaolong Teng,et al.  Face recognition using discriminant locality preserving projections , 2006, Image Vis. Comput..

[11]  Li Zhang,et al.  Hidden space discriminant neighborhood embedding , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[12]  Lei Zhang,et al.  Gene expression data classification using locally linear discriminant embedding , 2010, Comput. Biol. Medicine.

[13]  Cheng Wang,et al.  Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data , 2013, Comput. Stat. Data Anal..

[14]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[15]  Shanwen Zhang,et al.  A supervised orthogonal discriminant projection for tumor classification using gene expression data , 2013, Comput. Biol. Medicine.

[16]  Miao Qi,et al.  A Supervised Locality Preserving Projections Based Local Matching Algorithm for Face Recognition , 2010, AST/UCMA/ISA/ACN.

[17]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[18]  Li Zhang,et al.  A supervised neighborhood preserving embedding for face recognition , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[19]  Jian Yang,et al.  Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data , 2013, Comput. Biol. Medicine.

[20]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[23]  Qi Shen,et al.  Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification , 2009, Comput. Biol. Medicine.

[24]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  Andrew Kusiak,et al.  Cancer gene search with data-mining and genetic algorithms , 2007, Comput. Biol. Medicine.

[27]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[28]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[29]  Gene H. Golub,et al.  Matrix computations , 1983 .

[30]  Ting Wang,et al.  Kernel Sparse Representation-Based Classifier , 2012, IEEE Transactions on Signal Processing.

[31]  Nanning Zheng,et al.  Neighborhood Discriminant Projection for Face Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[32]  Hau-San Wong,et al.  Constructing the gene regulation-level representation of microarray data for cancer classification , 2008, J. Biomed. Informatics.

[33]  Xiangyang Xue,et al.  Discriminant neighborhood embedding for classification , 2006, Pattern Recognit..

[34]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[35]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  I. Jolliffe Principal Component Analysis , 2002 .

[37]  R. W. Lutz,et al.  Metabolic profiling of glucuronides in human urine by LC-MS/MS and partial least-squares discriminant analysis for classification and prediction of gender. , 2006, Analytical chemistry.

[38]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[39]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[40]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[41]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[42]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[43]  M. P. Gupta,et al.  Robust Approach for Estimating Probabilities in Naive-Bayes Classifier , 2007, PReMI.