Simulation of microarray data with realistic characteristics

BackgroundMicroarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.ResultsWe present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.ConclusionThe proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.

[1]  Anne-Mette K. Hein,et al.  BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. , 2005, Biostatistics.

[2]  Hongyue Dai,et al.  Widespread aneuploidy revealed by DNA microarray expression profiling , 2000, Nature Genetics.

[3]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[4]  Jean Yee Hwa Yang,et al.  Analysis of CDNA Microarray Images , 2001, Briefings Bioinform..

[5]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[6]  Jae K. Lee,et al.  Bayesian hierarchical error model for analysis of gene expression data , 2004, Bioinform..

[7]  Heikki Huttunen,et al.  Estimation and inversion of the effects of cell population asynchrony in gene expression time-series , 2003, Signal Process..

[8]  Yoganand Balagurunathan,et al.  Noise factor analysis for cDNA microarrays. , 2004, Journal of biomedical optics.

[9]  A. E. Hirsh,et al.  Noise Minimization in Eukaryotic Gene Expression , 2004, PLoS biology.

[10]  Claus Thorn Ekstrøm,et al.  Spot shape modelling and data transformations for microarrays , 2004, Bioinform..

[11]  Terence P. Speed,et al.  Comparison of Methods for Image Analysis on cDNA Microarray Data , 2002 .

[12]  Tommi S. Jaakkola,et al.  Maximum-likelihood estimation of optimal scaling factors for expression array normalization , 2001, SPIE BiOS.

[13]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[14]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[15]  Katherine C. Chen,et al.  Kinetic analysis of a molecular model of the budding yeast cell cycle. , 2000, Molecular biology of the cell.

[16]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. Trotter,et al.  Interpretation of skin biopsies by general pathologists: diagnostic discrepancy rate measured by blinded review. , 2009, Archives of pathology & laboratory medicine.

[18]  Horst Bischof,et al.  Robust DNA microarray image analysis , 2003, Machine Vision and Applications.

[19]  Ilya Shmulevich,et al.  Unsupervised Analysis Uncovers Changes in Histopathologic Diagnosis in Supervised Genomic Studies , 2006, Technology in cancer research & treatment.

[20]  Ilya Shmulevich,et al.  In silico microdissection of microarray data from heterogeneous cell populations , 2005, BMC Bioinformatics.

[21]  Olli Yli-Harja,et al.  Simulation tools for biochemical networks: evaluation of performance and usability , 2005, Bioinform..

[22]  Yoganand Balagurunathan,et al.  Simulation of cDNA microarrays via a parameterized random signal model. , 2002, Journal of biomedical optics.

[23]  Mads Kærn,et al.  Noise in eukaryotic gene expression , 2003, Nature.

[24]  Pedro Mendes,et al.  GEPASI: a software package for modelling the dynamics, steady states and control of biochemical and other systems , 1993, Comput. Appl. Biosci..

[25]  Katherine C. Chen,et al.  Integrative analysis of cell cycle control in budding yeast. , 2004, Molecular biology of the cell.

[26]  S. Huang,et al.  Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. , 2000, Experimental cell research.

[27]  Sunil Singhal,et al.  MicroArray Data Simulator For Improved Selection of Differentially Expressed Genes , 2003, Cancer biology & therapy.

[28]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[29]  Ron O. Dror,et al.  Bayesian Estimation of Transcript Levels Using a General Model of Array Measurement Noise , 2003, J. Comput. Biol..

[30]  Ralf Herwig,et al.  Simulation of DNA array hybridization experiments and evaluation of critical parameters during subsequent image and data analysis , 2002, BMC Bioinformatics.

[31]  J. Astola,et al.  INFERENCE OF GENETIC REGULATORY NETWORKS UNDER THE BEST-FIT EXTENSION PARADIGM , 2001 .

[32]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..