Discovering Biological Progression Underlying Microarray Samples

In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression.

[1]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[2]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[4]  A. Schäffer,et al.  Tumor classification using phylogenetic methods on expression data. , 2004, Journal of theoretical biology.

[5]  R. Schwartz,et al.  Network-Based Inference of Cancer Progression from Microarray Data , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[7]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[8]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[9]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Trevor F. Cox,et al.  Multidimensional Scaling, Second Edition , 2000 .

[12]  Stefano Lonardi,et al.  Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph , 2008, PLoS genetics.

[13]  Li Song,et al.  Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect , 2003, BMC Bioinformatics.

[14]  Junhyong Kim,et al.  Reconstructing the Temporal Ordering of Biological Samples Using Microarray Data , 2003, Bioinform..

[15]  Seth Pettie,et al.  An optimal minimum spanning tree algorithm , 2000, JACM.

[16]  Alexei A. Sharov,et al.  Defining Developmental Potency and Cell Lineage Trajectories by Expression Profiling of Differentiating Mouse Embryonic Stem Cells , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  Peng Qiu,et al.  Simultaneous Class Discovery and Classification of Microarray Data Using Spectral Analysis , 2009, J. Comput. Biol..

[18]  Alfred O. Hero,et al.  Network constrained clustering for gene microarray data , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[20]  Rudolf Grosschedl,et al.  Transcription control of early B cell differentiation. , 2010, Current opinion in immunology.

[21]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[22]  Peng Qiu,et al.  Fast calculation of pairwise mutual information for gene regulatory network reconstruction , 2009, Comput. Methods Programs Biomed..

[23]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[24]  Steven Skiena,et al.  Analysis techniques for microarray time-series data , 2001, RECOMB.

[25]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[26]  K. J. Ray Liu,et al.  Ensemble dependence model for classification and prediction of cancer and normal gene expression data , 2005, Bioinform..

[27]  Inge Jonassen,et al.  Characterization of Early Stages of Human B Cell Development by Gene Expression Profiling1 , 2007, The Journal of Immunology.

[28]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Yufeng Wang,et al.  Bayesian Inference of Genetic Regulatory Networks from Time Series Microarray Data Using Dynamic Bayesian Networks , 2007, J. Multim..

[30]  A. Gupta,et al.  Extracting Dynamics from Static Cancer Expression Data , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Riccardo Bellazzi,et al.  Precedence Temporal Networks to represent temporal relationships in gene expression data , 2007, J. Biomed. Informatics.

[32]  M. Becich,et al.  Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process , 2007, BMC Cancer.