Aliasing in Gene Feature Detection by Projective Methods

Because of measurements obtained under limited experimental conditions or time points compared to the presence of many genes, also known as the "large dimension, small sample size" problem, dimensionality reduction techniques are a common practice in statistical bioinformatics involving microarray analysis. However, in order to improve the performance of reverse engineering and statistical inference procedures aimed to estimate gene-gene connectivity links, some kind of regularization is usually needed to reduce the overall data complexities, together with ad hoc feature selection to uncover biologically relevant gene associations. The paper deals with feature selection by projective methods; in particular, it addresses some issues: Can the impact of noise on the data be limited by shrinkage or de-noising? How can complexity from convoluted dynamics associated with microarray measurements be discounted? In modeling such data, how to deal with over-parametrization, and control it? The problem of aliasing is then discussed and classified into two categories according to the trade-off between biological relevance and noise, and finally reported in analytical form via subspace analysis.

[1]  Gene H Golub,et al.  Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[3]  J. Friedman Exploratory Projection Pursuit , 1987 .

[4]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Ricardo López-Ruiz,et al.  A Statistical Measure of Complexity , 1995, ArXiv.

[6]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[8]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[9]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[10]  Enrico Capobianco,et al.  Model validation for gene selection and regulation maps , 2008, Functional & Integrative Genomics.

[11]  Enrico Capobianco,et al.  Mining Time-dependent Gene Features , 2005, J. Bioinform. Comput. Biol..

[12]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[13]  Aapo Hyvärinen,et al.  A Fast Fixed-Point Algorithm for Independent Component Analysis , 1997, Neural Computation.

[14]  Bruno Torrésani,et al.  Blind Source Separation and the Analysis of Microarray Data , 2004, J. Comput. Biol..

[15]  M. Rajković Extracting meaningful information from financial data , 2000 .

[16]  Filipe Aires,et al.  Blind source separation in the presence of weak sources , 2000, Neural Networks.

[17]  J. Mesirov,et al.  Metagene projection for cross-platform, cross-species characterization of global transcriptional states , 2007, Proceedings of the National Academy of Sciences.

[18]  Robin Sibson,et al.  What is projection pursuit , 1987 .