Integrated Analysis of Gene Expression and Copy Number Data using Sparse Representation Based Clustering Model

Among biological measurements, DNA microarray gene expression and array comparative genomic hybridization (aCGH) have been widely used. Due to the vast information of the biological data, various clustering techniques have been developed to identify subsets of genes with specific gene expression patterns and large variations across samples. Since integrated analysis of genomic data from different sources can further increase the reliability of biological analysis results, methods of integrating and analyzing different types of genomic measurements have emerged. In this work, we jointly examine gene expression and copy number data and iteratively project the data on different clusters through the sparse representation based clustering (SRC) model. Our method has been tested on a breast cancer cell lines data set and a breast tumors data set. In addition, simulated data sets were used to test the robustness of the method to noise. Experiments showed that our proposed method can effectively identify genes with large variations in gene expression and copy number, and locate genes that are statistically significant in both measurements. The proposed method can be applicable to a wide variety of biological problems where joint analysis of biological measurements is a common challenge.

[1]  Sanjit K. Mitra,et al.  Jointly Analyzing Gene Expression and Copy Number Data in Breast Cancer Using Data Reduction Models , 2006, IEEE ACM Trans. Comput. Biol. Bioinform..

[2]  D. Botstein,et al.  Expression array technology in the diagnosis and treatment of breast cancer. , 2002, Molecular interventions.

[3]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[4]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[5]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[6]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[7]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[8]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[9]  Michael L. Bittner,et al.  Comprehensive copy number and gene expression profiling of the 17q23 amplicon in human breast cancer , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[11]  Michael L. Bittner,et al.  Which is better for cDNA-microarray-based classification: ratios or direct intensities , 2004, Bioinform..

[12]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[13]  Yaakov Tsaig,et al.  Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[14]  Ash A. Alizadeh,et al.  Genome-wide analysis of DNA copy-number changes using cDNA microarrays , 1999, Nature Genetics.

[15]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Jaakko Astola,et al.  A strategy for identifying putative causes of gene expression variation in human cancers , 2004, J. Frankl. Inst..

[17]  D. Donoho,et al.  Fast Solution of -Norm Minimization Problems When the Solution May Be Sparse , 2008 .

[18]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[19]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[20]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Barbara J. Trask,et al.  Array Comparative Genomic Hybridization Analysis of Genomic Alterations in Breast Cancer Subtypes , 2004, Cancer Research.

[22]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[23]  O. Kallioniemi,et al.  Genome screening by comparative genomic hybridization. , 1997, Trends in genetics : TIG.

[24]  Ajay N. Jain,et al.  Assembly of microarrays for genome-wide measurement of DNA copy number , 2001, Nature Genetics.

[25]  John A. Berger,et al.  Jointly analyzing gene expression and copy number data in breast cancer using data reduction models , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Bradley P. Coe,et al.  A tiling resolution DNA microarray with complete coverage of the human genome , 2004, Nature Genetics.

[27]  M. Ringnér,et al.  Impact of DNA amplification on gene expression patterns in breast cancer. , 2002, Cancer research.

[28]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[29]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.