Applying gene ontology to microarray gene expression data analysis

Selecting informative genes from microarray gene expression data is the most important task while performing data analysis on the large amount of data. Mining genes having regulatory relations within thousands of genes is essential. To fit this need, a number of methods were proposed from various points of view. However, most existing methods solely focus on gene expression values themselves without using any external information of genes. Gene Ontology (GO) provides biological information of genes or proteins involved. It utilizes a hierarchical structure to give additional biological information of genes as the aid for data analysis. In this paper, we first give a brief description about the GO structure and give a review of existing literatures that take GO into account. Subsequently, we propose a novel method to identify regulatory gene pairs in a real microarray dataset based on dynamic time warping (DTW) algorithm and GO. Finally, we summarize this paper with a discussion on how GO can be used to facilitate the analysis of microarray gene expression data.

[1]  Mu-Yen Chen,et al.  Similarity Analysis of Time Series Gene Expression using Dual-Tree Wavelet Transform , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[4]  Aidong Zhang,et al.  Selecting informative genes from microarray dataset by incorporating gene ontology , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[5]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..

[6]  Graham R. Wood,et al.  A multi-stage approach to clustering and imputation of gene expression profiles , 2007, Bioinform..

[7]  Hong Yan,et al.  Periodicity Identification of Microarray Time Series Data based on Spectral Analysis , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Satoru Miyano,et al.  Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. , 2004 .

[10]  Antonio Sanfilippo,et al.  Enhancing Automatic Biological Pathway Generation with GO-Based Gene Similarity , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[11]  Hsin-Min Wang,et al.  A Query-by-Singing System for Retrieving Karaoke Music , 2008, IEEE Transactions on Multimedia.

[12]  Azadeh Mohammadi,et al.  Estimating Missing Value in Microarray Data Using Fuzzy Clustering and Gene Ontology , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[13]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[14]  Steven Skiena,et al.  Analysis techniques for microarray time-series data , 2001, RECOMB.

[15]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  S. Levinson,et al.  Considerations in dynamic time warping algorithms for discrete word recognition , 1978 .

[17]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[18]  Cesare Furlanello,et al.  Combining feature selection and DTW for time-varying functional genomics , 2006, IEEE Transactions on Signal Processing.

[19]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[20]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[21]  Hong Yan,et al.  Measuring Correlation between Microarray Time-series Data using Dominant Spectrum Component , 2004, APBC.

[22]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[23]  Satoru Miyano,et al.  Dynamic Bayesian Network and Nonparametric Regression for Nonlinear Modeling of Gene Networks from Time Series Gene Expression Data , 2003, CMSB.

[24]  Hui-Huang Hsu,et al.  Outlier Filtering for Identification of Gene Regulations in Microarray Time-Series Data , 2009, 2009 International Conference on Complex, Intelligent and Software Intensive Systems.