Approximate geodesic distances reveal biologically relevant structures in microarray data

MOTIVATION Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. RESULTS We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.

[1]  M. Minden,et al.  Molecular cytogenetic characterization of non‐Hodgkin lymphoma cell lines , 2002, Genes, chromosomes & cancer.

[2]  David Botstein,et al.  Challenges in developing a molecular characterization of cancer. , 2002, Seminars in oncology.

[3]  Javed Khan,et al.  Gene expression profiling in cancer using cDNA microarrays. , 2002, Methods in molecular medicine.

[4]  Ash A. Alizadeh,et al.  The t(14;18) defines a unique subset of diffuse large B-cell lymphoma with a germinal center B-cell gene expression profile. , 2002, Blood.

[5]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[6]  I. Hassan Embedded , 2005, The Cyber Security Handbook.

[7]  S. Schiffman Introduction to Multidimensional Scaling , 1981 .

[8]  M. Minden,et al.  The presence of clonogenic cells in high-grade malignant lymphoma: a prognostic factor. , 1987, Blood.

[9]  A. Epstein,et al.  Feeder layer and nutritional requirements for the establishment and cloning of human malignant lymphoma cell lines. , 1979, Cancer research.

[10]  K. Rajewsky,et al.  Cellular origin of human B-cell lymphomas. , 1999, The New England journal of medicine.

[11]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[12]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[14]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[15]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[16]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[17]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[18]  D. Botstein,et al.  Diversity of gene expression in adenocarcinoma of the lung , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[20]  M. Bittner,et al.  Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. , 2001, Cancer research.

[21]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[22]  Andrius Kazlauskas,et al.  Diverse Signaling Pathways Activated by Growth Factor Receptors Induce Broadly Overlapping, Rather Than Independent, Sets of Genes , 1999, Cell.

[23]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[24]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.