Kernel Based Nonlinear Dimensionality Reduction and Classification for Genomic Microarray

Genomic microarrays are powerful research tools in bioinformatics and modern medicinal research because they enable massively-parallel assays and simultaneous monitoring of thousands of gene expression of biological samples. However, a simple microarray experiment often leads to very high-dimensional data and a huge amount of information, the vast amount of data challenges researchers into extracting the important features and reducing the high dimensionality. In this paper, a nonlinear dimensionality reduction kernel method based locally linear embedding(LLE) is proposed, and fuzzy K-nearest neighbors algorithm which denoises datasets will be introduced as a replacement to the classical LLE's KNN algorithm. In addition, kernel method based support vector machine (SVM) will be used to classify genomic microarray data sets in this paper. We demonstrate the application of the techniques to two published DNA microarray data sets. The experimental results confirm the superiority and high success rates of the presented method.

[1]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[2]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[3]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[4]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[5]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[6]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Ching Y. Suen,et al.  Pattern Recognition. The Journal of the Pattern Recognition Society , 1968 .

[11]  Kuldip K. Paliwal,et al.  Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition , 2003, Pattern Recognit..

[12]  Ahmed M. Elgammal,et al.  Separating style and content on a nonlinear manifold , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[14]  Tomaso Poggio,et al.  Multiclass Classification of SRBCTs , 2001 .

[15]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[16]  L. Saul,et al.  Think globally, fit locally: unsupervised l earning of non-linear manifolds , 2002 .

[17]  Jun Wang,et al.  Reconstruction and analysis of multi-pose face images based on nonlinear dimensionality reduction , 2004, Pattern Recognit..

[18]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Matti Pietikäinen,et al.  Selection of the Optimal Parameter value for the Locally Linear Embedding Algorithm , 2002, FSKD.

[20]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[21]  R. Young,et al.  Biomedical Discovery with DNA Arrays , 2000, Cell.

[22]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[24]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[25]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[26]  U. Scherf,et al.  Large-scale gene expression analysis in molecular target discovery , 2002, Leukemia.

[27]  Yu-Dong Cai,et al.  A novel computational method to predict transcription factor DNA binding preference. , 2006, Biochemical and biophysical research communications.

[28]  Nikhil R. Pal,et al.  Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering , 2007, BMC Bioinformatics.

[29]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[30]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[31]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[32]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[33]  M. Ellis,et al.  Development and validation of a method for using breast core needle biopsies for gene expression microarray analyses. , 2002, Clinical cancer research : an official journal of the American Association for Cancer Research.

[34]  John K. Tsotsos,et al.  Face recognition with weighted locally linear embedding , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[35]  Yanda Li,et al.  Prediction of C-to-U RNA editing sites in higher plant mitochondria using only nucleotide sequence features. , 2007, Biochemical and biophysical research communications.

[36]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.