ROBUST GENETIC NETWORK MODELING BY ADDING NOISY DATA

The most fundamental problem in genetic network modeling is generally known as the dimensionality problem. Typical gene expression matrices contain measurements of thousands of genes taken over fewer than twenty time-steps. A large dynamic network cannot be learned from data with such a limited number of time-steps without the use of additional constraints, preferably derived from biological knowledge. In this paper, we present an approach that can find rough estimates of the underlying genetic network based on limited time-course gene expression data by employing the fact that gene expression measurements are relatively noisy and genetic networks are thought to be robust. The method expands the data-set by adding noisy duplicates, thereby simultaneously tackling the dimensionality problem and making the solutions more robust against (the already large) noise in the data. This simple concept is similar to adding a Tikhonov regularization term in the optimization process. In the case of linear models, the addition of noisy duplicates is equivalent to ridge regression, i.e. the sum of the squared weights is minimized as well as the prediction error. In the limiting case, it becomes even equivalent to the application of the MoorePenrose Pseudo-Inverse to the original data. The strength of the proposed concept of adding noisy duplicates lies in the fact that it can be employed to all modelling approaches, including non-linear models.

[1]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[2]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[3]  Marcel J. T. Reinders,et al.  Genetic network models: a comparative study , 2001, SPIE BiOS.

[4]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[5]  Marcel J. T. Reinders,et al.  Linear Modeling of Genetic Networks from Experimental Data , 2000, ISMB.

[6]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[7]  Gary D. Stormo,et al.  Modeling Regulatory Networks with Weight Matrices , 1998, Pacific Symposium on Biocomputing.

[8]  Marcel J. T. Reinders,et al.  A Comparison of Genetic Network Models , 2000, Pacific Symposium on Biocomputing.

[9]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[10]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[11]  M Wahde,et al.  Coarse-grained reverse engineering of genetic regulatory networks. , 2000, Bio Systems.

[12]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[13]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[14]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[15]  Michael R. Green,et al.  Dissecting the Regulatory Circuitry of a Eukaryotic Genome , 1998, Cell.

[16]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[18]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[19]  Steven Skiena,et al.  Identifying gene regulatory networks from experimental data , 2001, Parallel Comput..