Relational patterns of gene expression via non-metric multidimensional scaling analysis

MOTIVATION Microarray experiments result in large-scale data sets that require extensive mining and refining to extract useful information. We demonstrate the usefulness of (non-metric) multidimensional scaling (MDS) method in analyzing a large number of genes. Applying MDS to the microarray data is certainly not new, but the existing works are all on small numbers (< 100) of points to be analyzed. We have been developing an efficient novel algorithm for non-metric MDS (nMDS) analysis for very large data sets as a maximally unsupervised data mining device. We wish to demonstrate its usefulness in the context of bioinformatics (unraveling relational patterns among genes from time series data in this paper). RESULTS The Pearson correlation coefficient with its sign flipped is used to measure the dissimilarity of the gene activities in transcriptional response of cell-cycle-synchronized human fibroblasts to serum. These dissimilarity data have been analyzed with our nMDS algorithm to produce an almost circular relational pattern of the genes. The obtained pattern expresses a temporal order in the data in this example; the temporal expression pattern of the genes rotates along this circular arrangement and is related to the cell cycle. For the data we analyze in this paper we observe the following. If an appropriate preparation procedure is applied to the original data set, linear methods such as the principal component analysis (PCA) could achieve reasonable results, but without data preprocessing linear methods such as PCA cannot achieve a useful picture. Furthermore, even with an appropriate data preprocessing, the outcomes of linear procedures are not as clear-cut as those by nMDS without preprocessing.

[1]  Raj Acharya,et al.  An information theoretic approach for analyzing temporal patterns of gene expression , 2003, Bioinform..

[2]  Y-h. Taguchi,et al.  Nonmetric Multidimensional Scaling As a Data‐Mining Tool: New Algorithm and New Targets , 2005 .

[3]  Ronald W. Davis,et al.  Transcriptional regulation and function during the human cell cycle , 2001, Nature Genetics.

[4]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[5]  Martin Vetterli,et al.  Data Compression and Harmonic Analysis , 1998, IEEE Trans. Inf. Theory.

[6]  Anders Berglund,et al.  A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription , 2003, Bioinform..

[7]  Ilya Shmulevich,et al.  Binary analysis and optimization-based normalization of gene expression data , 2002, Bioinform..

[8]  S. Kanaya,et al.  Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. , 2001, Gene.

[9]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[10]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[11]  K. Shedden,et al.  Analysis of cell-cycle-specific gene expression in human cells as determined by microarrays and double-thymidine block synchronization , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[13]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[14]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Paul E. Green,et al.  Multidimensional Scaling: Concepts and Applications , 1989 .

[16]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[17]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[18]  Jan Komorowski,et al.  Predicting gene ontology biological process from temporal gene expression patterns. , 2003, Genome research.

[19]  Torben F. Ørntoft,et al.  Identifying distinct classes of bladder carcinoma using microarrays , 2003, Nature Genetics.

[20]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[21]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[22]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.