Outlier Filtering for Identification of Gene Regulations in Microarray Time-Series Data

Microarray technology provides an opportunity for scientists to analyze thousands of gene expression profiles simultaneously. Time-series microarray data are gene expression values generated from microarray experiments within certain time intervals. Scientists can infer gene regulations in a biological system by judging whether two genes present similar gene expression values in microarray time-series data. Recently, a great many methods are widely applied on microarray time-series data to find out the similarity and the correlation degree among genes. Existing approaches including traditional Pearson coefficient correlation, Bayesian networks, clustering analysis, classification methods, and correlation analysis have individual disadvantages such as high computational complexity or they may be unsuitable for some microarray data. Traditional Pearson correlation coefficient is a numeric measuring method which gives novel effectiveness on two sets of numeric data. However, it is not suitable to be applied on microarray time-series data because of the existence of outliers among gene expression values. This paper presents a novel method of applying Pearson correlation coefficient along with an outlier filtering procedure on the widely-used microarray time-series datasets. Results show that the proposed method produces a better outcome compared with traditional Pearson correlation coefficient on the same dataset. Results show that the proposed method not only can find out certain more known regulatory gene pairs, but also keeps rational computational time.

[1]  Satoru Miyano,et al.  Dynamic Bayesian Network and Nonparametric Regression for Nonlinear Modeling of Gene Networks from Time Series Gene Expression Data , 2003, CMSB.

[2]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[3]  Vincent S. Tseng,et al.  Gene Relation Discovery by Mining Similar Subsequences in Time-Series Microarray Data , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[4]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[5]  Hong Yan,et al.  Measuring Correlation between Microarray Time-series Data using Dominant Spectrum Component , 2004, APBC.

[6]  Steven Skiena,et al.  Analysis techniques for microarray time-series data , 2001, RECOMB.

[7]  Mu-Yen Chen,et al.  Similarity Analysis of Time Series Gene Expression using Dual-Tree Wavelet Transform , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Juan Liu,et al.  A simple method of inferring pairwise gene interactions from microarray time series data , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[9]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[10]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  Hong Yan,et al.  Periodicity Identification of Microarray Time Series Data based on Spectral Analysis , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[14]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.