A new DP algorithm for comparing gene expression data using geometric similarity

Microarray gene expression data comes as a time series, where the expression level of a gene is recorded at specific time points. Comparing the time series produced by two genes can give us information about the regulatory or inhibitory relationship between the genes. We present a Dynamic Programming (DP) method to compare gene expression data using geometric similarity. We aim to detect similarities and relationships between genes, based on their expression time series. By representing the time series as polygons and compare them, we can find relationships that are not available when the two time series are compared point-by-point. We applied our algorithm on a dataset of 343 regulatory pairs from the alpha dataset and compared them to randomly generated pairs. Using an SVM classifier, we find the optimal similarity score that separates the regulatory dataset from the random pairs. Our results show that we can detect similar pairs better than simple Pearson correlation and we outperform many of the existing methods. This method is an ongoing approach, that can be applied to finding the similarity of any data that can convert to 2D polygon. In the future, we plan to introduce this method as a new classifier.

[1]  Holger H. Hoos,et al.  Inference of Transcriptional Regulation Relationships from Gene Expression Data , 2003, Bioinform..

[2]  Paul Horton,et al.  CellMontage: Similar Expression Profile Search Server , 2007, Bioinform..

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[5]  Mu-Yen Chen,et al.  Similarity Matches of Gene Expression Data Based on Wavelet Transform , 2009, SCIA.

[6]  Hong Yan,et al.  Pattern recognition techniques for the emerging field of bioinformatics: A review , 2005, Pattern Recognit..

[7]  M. Gerstein,et al.  Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. , 2001, Journal of molecular biology.

[8]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[9]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[10]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[11]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[12]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[13]  Sergey Bereg,et al.  Voronoi Diagram of Polygonal Chains under the Discrete FRéChet Distance , 2007, Int. J. Comput. Geom. Appl..

[14]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[15]  Ronald W. Davis,et al.  Transcriptional regulation and function during the human cell cycle , 2001, Nature Genetics.

[16]  Lawrence Hunter,et al.  GEST: a gene expression search tool based on a novel Bayesian similarity metric , 2001, ISMB.

[17]  Roberto Cordone,et al.  A GRASP metaheuristic for microarray data analysis , 2013, Comput. Oper. Res..