Friendly neighbors method for unsupervised determination of gene significance in time-course microarray data

The abundance of a significant portion of the temporal induction-repression expression pattern of a gene among other genes in a time-course data is an indication of its non-randomness. The significance of the portions that match between two gene profiles can be derived using binomial analysis or its variant. Considering the induction-repression pattern alone is both meaningful and significant since the related genes induced/repressed in a given period may not show the same exact shape of induction/repression. Further, microarray measurements are of low quality, which might make expression patterns of related genes less similar. Based on this observation we developed an algorithm called friendly neighbors (FNs). This algorithm finds the significance score of a gene as the number of genes in the same experiment that share its induction-repression pattern more than a certain threshold. The concept of friendly neighbors is different from that of nearest neighbors. A neighbor that satisfies certain similarity condition is called friendly neighbor where as a nearest neighbor is the most similar neighbor of all neighbors. This leads to the observation that all friendly neighbors does not necessarily be nearest neighbors, vice versa. The FNs approach has been applied to discover putative estrogen target genes and to detect cell cycle regulated genes in S. cerevisiae. The new approach performed better than paired t-test and simple expression level based filtering methods on estrogen target gene discovery. It did significantly well on cell cycle regulated gene discovery in the absence of task-specific knowledge. Its performance is better than commonly used Fourier transform method and fold change methods. Apart from detecting cell cycle regulated genes, it also detected other prominent patterns which could be detected only by more complicated clustering and data analysis methods. Availability: http://giscompute.gis.a-star.edu.sg/FNs.

[1]  G. Bouffard,et al.  Gene discovery using computational and microarray analysis of transcription in the Drosophila melanogaster testis. , 2000, Genome research.

[2]  J. Olson,et al.  A regression-based method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington's disease transgenic model. , 2002, Human molecular genetics.

[3]  D. Covell,et al.  Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. , 2003, Molecular cancer therapeutics.

[4]  J Carl Barrett,et al.  Microarrays : the use of oligonucleotides and cDNA for the analysis of gene expression , 2003 .

[5]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[6]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[7]  Raj Acharya,et al.  An information theoretic approach for analyzing temporal patterns of gene expression , 2003, Bioinform..

[8]  Taesung Park,et al.  Statistical tests for identifying differentially expressed genes in time-course microarray experiments , 2003, Bioinform..

[9]  Alan M. Frieze,et al.  On the power of universal bases in sequencing by hybridization , 1999, RECOMB.

[10]  T. Yamori,et al.  Development of cDNA microarray for expression profiling of estrogen-responsive genes. , 2002, Journal of molecular endocrinology.

[11]  Vladimir B. Bajic,et al.  Dragon ERE Finder version 2: a tool for accurate detection and analysis of estrogen response elements in vertebrate genomes , 2003, Nucleic Acids Res..

[12]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[13]  B. Korn,et al.  Normalization of array hybridization experiments in differential gene expression analysis. , 1999, Nucleic acids research.

[14]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[17]  M Schena,et al.  Microarrays: biotechnology's discovery platform for functional genomics. , 1998, Trends in biotechnology.

[18]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[19]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[20]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[21]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[22]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[24]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[25]  F. Wilcoxon SOME RAPID APPROXIMATE STATISTICAL PROCEDURES , 1950 .

[26]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.