Simultaneous analysis of multiple data types in pharmacogenomic studies using weighted sparse canonical correlation analysis.

Variation in drug response results from a combination of factors that include differences in gender, ethnicity, and environment, as well as genetic variation that may result in differences in mRNA and protein expression. This article presents two integrative analytic approaches that make use of both genome-wide SNP and mRNA expression data available on the same set of subjects: a step-wise integrative approach and a comprehensive analysis using sparse canonical correlation analysis (SCCA). In addition to applying standard SCCA, we present a novel modification of SCCA which allows different weighting for the various pair-wise relationships in the SCCA. These integrative approaches are illustrated with both simulated data and data from a pharmacogenomic study of the drug gemcitabine. Results from these analyses found little overlap in terms of genes detected, possibly detecting different biological mechanisms. In addition, we found the proposed weighted SCCA to outperform its unweighted counterpart in detecting associations between the genomic features and phenotype. Further research is needed to develop and assess new integrative methods for pharmacogenomic studies, as these types of analyses may uncover novel insights into the relationship between genomic variation and drug response.

[1]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[2]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[3]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[4]  Eden Martin,et al.  Genomic convergence: identifying candidate genes for Parkinson's disease by combining serial analysis of gene expression and genetic linkage. , 2003, Human molecular genetics.

[5]  David V Conti,et al.  Testing association between disease and multiple SNPs in a candidate gene , 2007, Genetic epidemiology.

[6]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[7]  Xuming He,et al.  Dimension reduction based on constrained canonical correlation and variable filtering , 2008, 0808.0977.

[8]  Krishna R. Kalari,et al.  Gemcitabine and Arabinosylcytosin Pharmacogenomics: Genome-Wide Association and Drug Response Biomarkers , 2009, PloS one.

[9]  Krishna R. Kalari,et al.  Radiation pharmacogenomics: a genome-wide association approach to identify radiation response biomarkers using human lymphoblastoid cell lines. , 2010, Genome research.

[10]  D. Schaid,et al.  A Bayesian hierarchical nonlinear model for assessing the association between genetic variation and drug cytotoxicity , 2009, Statistics in medicine.

[11]  Alessandro Rinaldo,et al.  Characterization of multilocus linkage disequilibrium , 2005, Genetic epidemiology.

[12]  Anastasia Lykou,et al.  Sparse CCA using a Lasso with positivity constraints , 2010, Comput. Stat. Data Anal..

[13]  Krishna R. Kalari,et al.  Gemcitabine and cytosine arabinoside cytotoxicity: association with lymphoblastoid cell expression. , 2008, Cancer research.

[14]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[15]  Soma Das,et al.  Genetic variants contributing to daunorubicin-induced cytotoxicity. , 2008, Cancer research.

[16]  Ignacio Santamaría,et al.  A learning algorithm for adaptive canonical correlation analysis of several data sets , 2007, Neural Networks.