Exploration and reduction of high dimensional spaces with independent component analysis

The application of Independent Component Analysis (ICA) to genomic data is here considered. In recent years, microarrays have delivered to researchers huge series of measurements of gene expression levels under different experimental conditions. This work, for instance, emphasizes exploratory data analysis following experimental work on the popular Escherichia coli; the context is a typical one in which changes in gene expression values are observed after perturbing genes at an initial time and measuring the responses at regular time intervals until the steady state is achieved. The gene temporal patterns are as usual very short, and it is no exception for the application here described, as only six time points are available. This aspect combines with a very large feature space (i.e., the gene dimensionality). Thus, several kinds of fluctuations have to be monitored, and many are discarded because not significantly different from noise. ICA represents a very flexible signal processing tool which attempts to deal with noise as well, although the expected impact involves its most inherent property of delivering a decomposition of the gene profiles according to statistically independent sources of information. The latter are most likely linked to underlying biological processes regulating the genes, but it is not a goal of this paper to characterize these biological aspects, since this is a subject of ongoing research which requires further experimental investigation and tests before drawing any important conclusion. However, the initial computational results which have been obtained are very encouraging, and thus they are presented here together with some interesting problem-specific aspects. i

[1]  R. Tibshirani,et al.  Efficient quadratic regularization for expression arrays. , 2004, Biostatistics.

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  Lawrence Carin,et al.  Constrained Independent Component Analysis of DNA Microarray Signals , 2004 .

[4]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jean-Francois Cardoso,et al.  Source separation using higher order moments , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Adam Arkin,et al.  On the deduction of chemical reaction pathways from measurements of time series of concentrations. , 2001, Chaos.

[8]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[9]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[10]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[11]  Sanjit K. Mitra,et al.  Identifying underlying factors in breast cancer using independent component analysis , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[12]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[13]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[14]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[15]  G. Hori,et al.  No . 025 1 Blind Gene Classi fi cation-An ICA-based Gene Classi fi cation / Clustering Method - , 2002 .

[16]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Richard M. Everson,et al.  Particle Filters for Non-Stationary ICA , 2000 .

[18]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[19]  Enrico Capobianco,et al.  Independent Multiresolution Component Analysis and Matching Pursuit , 2003, Comput. Stat. Data Anal..

[20]  Jean-François Cardoso,et al.  Dependence, Correlation and Gaussianity in Independent Component Analysis , 2003, J. Mach. Learn. Res..

[21]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  M. P. Griffin,et al.  Sample entropy analysis of neonatal heart rate variability. , 2002, American journal of physiology. Regulatory, integrative and comparative physiology.

[23]  Filipe Aires,et al.  Blind source separation in the presence of weak sources , 2000, Neural Networks.

[24]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[25]  Bruno Torrésani,et al.  Blind Source Separation and the Analysis of Microarray Data , 2004, J. Comput. Biol..

[26]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[27]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[28]  Masato Inoue,et al.  Blind Gene Classification-An Application of a Signal Separation Method , 2001 .