Exploratory data analysis of DNA microarrays by multivariate curve resolution.

In this work, the application of a multivariate curve resolution procedure based on alternating least squares optimization (MCR-ALS) for the analysis of data from DNA microarrays is proposed. For this purpose, simulated and publicly available experimental data sets have been analyzed. Application of MCR-ALS, a method that operates without the use of any training set, has enabled the resolution of the relevant information about different cancer lines classification using a set of few components; each of these defined by a sample and a pure gene expression profile. From resolved sample profiles, a classification of samples according to their origin is proposed. From the resolved pure gene expression profiles, a set of over- or underexpressed genes that could be related to the development of cancer diseases has been selected. Advantages of the MCR-ALS procedure in relation to other previously proposed procedures such as principal component analysis are discussed.

[1]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[2]  David M. Haaland,et al.  Multivariate curve resolution for hyperspectral image analysis: applications to microarray technology , 2003, SPIE BiOS.

[3]  A. Giuliani,et al.  The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data , 2001, FEBS letters.

[4]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[5]  Dieter E. Jenne,et al.  Changes in chromatin organization at the neutrophil elastase locus associated with myeloid cell differentiation. , 1999 .

[6]  Roberto Marcondes Cesar Junior,et al.  Inference from Clustering with Application to Gene-Expression Microarrays , 2002, J. Comput. Biol..

[7]  Toby Segaran,et al.  Integrating database information in microarray expression analyses: Application to melanoma cell lines profiled in the NCI60 data set. , 2002, Journal of biomolecular techniques : JBT.

[8]  Igor V. Tetko,et al.  Optimization models for cancer classification: extracting gene interaction information from microarray expression data , 2004, Bioinform..

[9]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[10]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[11]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[12]  R. Tauler Multivariate curve resolution applied to second order data , 1995 .

[13]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[14]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[15]  M. Maeder Evolving factor analysis for the resolution of overlapping chromatographic peaks , 1987 .

[16]  Damià Barceló,et al.  Multivariate correlation between concentrations of selected herbicides and derivatives in outflows from selected U.S. Midwestern reservoirs , 2000 .

[17]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[18]  Romà Tauler,et al.  Modeling temperature-dependent protein structural transitions by combined near-IR and mid-IR spectroscopies and multivariate curve resolution. , 2003, Analytical chemistry.

[19]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[20]  Giuseppe Musumarra,et al.  Potentialities of multivariate approaches in genome‐based cancer research: identification of candidate genes for new diagnostics by PLS discriminant analysis , 2004 .

[21]  M. Soares,et al.  Identification and cloning of differentially expressed genes. , 1997, Current opinion in biotechnology.

[22]  Raimundo Gargallo,et al.  Application of multivariate resolution methods to the study of biochemical and biophysical processes. , 2004, Analytical biochemistry.

[23]  Jill Duncan,et al.  Analyzing microarray data using cluster analysis. , 2003, Pharmacogenomics.

[24]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[25]  W. Windig,et al.  Interactive self-modeling mixture analysis , 1991 .

[26]  Leif E. Peterson Partitioning large-sample microarray-based gene expression profiles using principal components analysis , 2003, Comput. Methods Programs Biomed..

[27]  J. Weinstein,et al.  Pharmacogenomic analysis: correlating molecular substructure classes with microarray gene expression data , 2002, The Pharmacogenomics Journal.

[28]  B. Kowalski,et al.  Selectivity, local rank, three‐way data analysis and ambiguity in multivariate curve resolution , 1995 .

[29]  Jianhua Xuan,et al.  Discriminatory Mining of Gene Expression Microarray Data , 2003, J. VLSI Signal Process..

[30]  S. D. Jong,et al.  Handbook of Chemometrics and Qualimetrics , 1998 .

[31]  E. A. Sylvestre,et al.  Self Modeling Curve Resolution , 1971 .

[32]  Carlo Di Bello,et al.  PCA disjoint models for multiclass cancer analysis using gene expression data , 2003, Bioinform..

[33]  A. Levine,et al.  Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. , 2001, Combinatorial chemistry & high throughput screening.

[34]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[35]  N R Schneider,et al.  Precursor B-cell lymphoblastic lymphoma. A study of nine cases lacking blood and bone marrow involvement and review of the literature. , 2001, American journal of clinical pathology.

[36]  Silvio Bicciato,et al.  Pattern identification and classification in gene expression data using an autoassociative neural network model. , 2003, Biotechnology and bioengineering.

[37]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Romà Tauler,et al.  A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curve resolution in MATLAB , 2005 .

[39]  Margaret Werner-Washburne,et al.  Identification and removal of contaminating fluorescence from commercial and in-house printed DNA microarrays. , 2003, Nucleic acids research.

[40]  R. Tauler Calculation of maximum and minimum band boundaries of feasible solutions for species profiles obtained by multivariate curve resolution , 2001 .

[41]  Stefano Toppo,et al.  Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. , 2003, Human molecular genetics.

[42]  E. Kok,et al.  The application of DNA microarrays in gene expression analysis. , 2000, Journal of biotechnology.

[43]  R. Tauler,et al.  Multivariate curve resolution: a powerful tool for the analysis of conformational transitions in nucleic acids. , 2002, Nucleic acids research.

[44]  Romà Tauler,et al.  Validation of alternating least-squares multivariate curve resolution for chromatographic resolution and quantitation , 1996 .

[45]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .