Effective Enrichment of Gene Expression Data Sets

The ever-growing need for gene-expression data analysis motivates studies in sample generation due to the lack of enough gene-expression data. It is common that there are thousands of genes but only tens or rarely hundreds of samples available. In this paper, we attempt to formulate the sample generation task as follows: first, building alternative Gene Regulatory Network (GRN) models, second, sampling data from each of them, and then filtering the generated samples using metrics that measure compatibility, diversity and coverage with respect to the original dataset. We constructed two alternative GRN models using Probabilistic Boolean Networks and Ordinary Differential Equations. We developed a multi-objective filtering mechanism based on the three metrics to assess the quality of the newly generated data. We presented a number of experiments to show effectiveness and applicability of the proposed multi-model framework.

[1]  A. Datta,et al.  External Control in Markovian Genetic Regulatory Networks , 2003, Proceedings of the 2003 American Control Conference, 2003..

[2]  Gregory Piatetsky-Shapiro,et al.  Capturing best practice for microarray gene expression data analysis , 2003, KDD '03.

[3]  Cheng Fang,et al.  Gene Expression Data Classification Using Artificial Neural Network Ensembles Based on Samples Filtering , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[4]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[5]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[6]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[7]  Ju Han Kim,et al.  Mixture-model based estimation of gene expression variance from public database improves identification of differentially expressed genes in small sized microarray data , 2009, Bioinform..

[8]  Edward R. Dougherty,et al.  Coefficient of determination in nonlinear signal processing , 2000, Signal Process..

[9]  Henriette Franz,et al.  Systematic analysis of gene expression in human brains before and after death , 2005, Genome Biology.

[10]  Arantxa Etxeverria The Origins of Order , 1993 .

[11]  Reda Alhajj,et al.  Employing Machine Learning Techniques for Data Enrichment: Increasing the Number of Samples for Effective Gene Expression Data Analysis , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[12]  Benjamin E Dunmore,et al.  Gene network inference and visualization tools for biologists: application to new human transcriptome datasets , 2011, Nucleic acids research.

[13]  Diego di Bernardo,et al.  Inference of gene regulatory networks and compound mode of action from time course gene expression profiles , 2006, Bioinform..

[14]  G A Whitmore,et al.  Power and sample size for DNA microarray studies , 2002, Statistics in medicine.

[15]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[16]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[17]  J Timmer,et al.  Quantitative data generation for systems biology: the impact of randomisation, calibrators and normalisers. , 2005, Systems biology.

[18]  M. van Iterson,et al.  Relative power and sample size analysis on gene expression profiling data , 2009, BMC Genomics.

[19]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.