BMC Bioinformatics BioMed Central Methodology article Multivariate curve resolution of time course microarray data

BackgroundModeling of gene expression data from time course experiments often involves the use of linear models such as those obtained from principal component analysis (PCA), independent component analysis (ICA), or other methods. Such methods do not generally yield factors with a clear biological interpretation. Moreover, implicit assumptions about the measurement errors often limit the application of these methods to log-transformed data, destroying linear structure in the untransformed expression data.ResultsIn this work, a method for the linear decomposition of gene expression data by multivariate curve resolution (MCR) is introduced. The MCR method is based on an alternating least-squares (ALS) algorithm implemented with a weighted least squares approach. The new method, MCR-WALS, extracts a small number of basis functions from untransformed microarray data using only non-negativity constraints. Measurement error information can be incorporated into the modeling process and missing data can be imputed. The utility of the method is demonstrated through its application to yeast cell cycle data.ConclusionProfiles extracted by MCR-WALS exhibit a strong correlation with cell cycle-associated genes, but also suggest new insights into the regulation of those genes. The unique features of the MCR-WALS algorithm are its freedom from assumptions about the underlying linear model other than the non-negativity of gene expression, its ability to analyze non-log-transformed data, and its use of measurement error information to obtain a weighted model and accommodate missing measurements.

[1]  Yi-Zeng Liang,et al.  Principles and methodologies in self-modeling curve resolution , 2004 .

[2]  Róbert Rajkó,et al.  Analytical solution for determining feasible regions of self‐modeling curve resolution (SMCR) method based on computational geometry , 2005 .

[3]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[4]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[5]  Ricardo D. Fierro,et al.  The Total Least Squares Problem: Computational Aspects and Analysis (S. Van Huffel and J. Vandewalle) , 1993, SIAM Rev..

[6]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  X. Cui,et al.  Transformations for cDNA Microarray Data , 2003, Statistical applications in genetics and molecular biology.

[8]  Samuel S. Wu,et al.  A statistical method for flagging weak spots improves normalization and ratio estimates in microarrays. , 2001, Physiological genomics.

[9]  M. V. Van Benthem,et al.  Fast algorithm for the solution of large‐scale non‐negativity‐constrained least squares problems , 2004 .

[10]  Daphne Koller,et al.  Decomposing Gene Expression into Cellular Processes , 2002, Pacific Symposium on Biocomputing.

[11]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[13]  P. Wentzell,et al.  Dynamic Monte Carlo self-modeling curve resolution method for multicomponent mixtures , 2002 .

[14]  Aleksey A. Nakorchevskiy,et al.  Expression deconvolution: A reinterpretation of DNA microarray data reveals dynamic changes in cell populations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Gene H Golub,et al.  Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Desire L. Massart,et al.  Application of the needle algorithm for exploratory analysis and resolution of HPLC-DAD data , 1996 .

[17]  P. Gemperline,et al.  Computation of the range of feasible solutions in self-modeling curve resolution algorithms. , 1999, Analytical chemistry.

[18]  S. Van Huffel,et al.  On the equivalence between total least squares and maximum likelihood PCA , 2005 .

[19]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Ronald C. Henry,et al.  Extension of self-modeling curve resolution to mixtures of more than three components: Part 3. Atmospheric aerosol data simulation studies☆ , 1990 .

[21]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[22]  Bruce R. Kowalski,et al.  An extension of the multivariate component-resolution method to three components , 1985 .

[23]  R. Tauler Calculation of maximum and minimum band boundaries of feasible solutions for species profiles obtained by multivariate curve resolution , 2001 .

[24]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[25]  David J. C. MacKay,et al.  Reproducibility Assessment of Independent Component Analysis of Expression Ratios From DNA Microarrays , 2003, Comparative and functional genomics.

[26]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[27]  Romà Tauler,et al.  Chemometrics applied to unravel multicomponent processes and mixtures: Revisiting latest trends in multivariate resolution , 2003 .

[28]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  R. Henry,et al.  Extension of self-modeling curve resolution to mixtures of more than three components: Part 2. Finding the complete solution , 1999 .

[30]  W. Windig,et al.  Interactive self-modeling mixture analysis , 1991 .

[31]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[32]  Sabine Van Huffel,et al.  Total least squares problem - computational aspects and analysis , 1991, Frontiers in applied mathematics.

[33]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[34]  Darren T. Andrews,et al.  Maximum likelihood principal component analysis , 1997 .

[35]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[37]  Edmund R. Malinowski,et al.  Factor Analysis in Chemistry , 1980 .

[38]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[39]  B. Kowalski,et al.  Multivariate curve resolution applied to spectral data from multiple runs of an industrial process , 1993 .

[40]  W. Windig,et al.  Factor Analysis in Chemistry (3rd Edition) , 2002 .

[41]  E. A. Sylvestre,et al.  Self Modeling Curve Resolution , 1971 .