Semi-supervised gene shaving method for predicting low variation biological pathways from genome-wide data

BackgroundThe gene shaving algorithm and many other clustering algorithms identify gene clusters showing high variation across samples. However, gene expression in many signaling pathways show only modest and concordant changes that fail to be identified by these methods. The increasingly available signaling pathway prior knowledge provide new opportunity to solve this problem.ResultsWe propose an innovative semi-supervised gene clustering algorithm, where the original gene shaving algorithm was extended and generalized so that prior knowledge of signaling pathways can be incorporated. Different from other methods, our method identifies gene clusters showing concerted and modest expression variation as well as strong expression correlation. Using available pathway gene sets as prior knowledge, whether complete or incomplete, our algorithm is capable of forming tightly regulated gene clusters showing modest variation across samples. We demonstrate the advantages of our algorithm over the original gene shaving algorithm using two microarray data sets. The stability of the gene clusters was accessed using a jackknife approach.ConclusionOur algorithm represents one of the first clustering algorithms that is particularly designed to identify signaling pathways of low and concordant gene expression variation. The discriminating power is achieved by manufacturing a principal component enriched by signaling pathways.

[1]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[2]  Wei Pan,et al.  Bioinformatics Original Paper Incorporating Gene Functions as Priors in Model-based Clustering of Microarray Gene Expression Data , 2022 .

[3]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  George C. Tseng,et al.  Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data , 2007, Bioinform..

[5]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[6]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[7]  ChenJie,et al.  Detecting periodic patterns in unevenly spaced gene expression time series using Lomb--Scargle periodograms , 2006 .

[8]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Yi Lu,et al.  MCM-test: a fuzzy-set-theory-based approach to differential analysis of gene pathways , 2008, BMC Bioinformatics.

[10]  Eyad Almasri,et al.  A statistical method to incorporate biological knowledge for generating testable novel gene regulatory interactions from microarray experiments , 2007, BMC Bioinformatics.

[11]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[12]  Michael E. Wall,et al.  SVDMAN-singular value decomposition analysis of microarray data , 2001, Bioinform..

[13]  Zhaohui S. Qin,et al.  Clustering microarray gene expression data using weighted Chinese restaurant process , 2006, Bioinform..

[14]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Jie Chen,et al.  A Complex Oscillating Network of Signaling Genes Underlies the Mouse Segmentation Clock , 2006, Science.

[16]  K-A. Do,et al.  Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data , 2007, Cancer informatics.

[17]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[18]  Alfred O. Hero,et al.  Network constrained clustering for gene microarray data , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  Daisuke Kihara,et al.  Bioinformatics resources for cancer research with an emphasis on gene function and structure prediction tools , 2006, Cancer informatics.

[20]  Charles Kooperberg,et al.  Global and gene‐specific analyses show distinct roles for Myod and Myog at a common set of promoters , 2006, The EMBO journal.

[21]  Gregory W Carter,et al.  Disentangling information flow in the Ras-cAMP signaling network. , 2006, Genome research.

[22]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[24]  P. Lio’,et al.  Periodic gene expression program of the fission yeast cell cycle , 2004, Nature Genetics.

[25]  R. Sharan,et al.  An initial blueprint for myogenic differentiation. , 2005, Genes & development.

[26]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[27]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[28]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.