A Sparse Latent Regression Approach for Integrative Analysis of Glycomic and Glycotranscriptomic Data

Glycomics and glycotranscitomics have emerged as two key high-throughput approaches to interrogating the glycome within specific cells, tissues or organisms under specific conditions. Because the glycotransciptomic analysis utilizes the same experimental protocol as the whole-transcriptome sequencing (RNA-seq) that is commonly used in the genomic research, the glycotranscriptomic information can be conveniently extracted in silico for many biological samples from which RNA-seq data have been collected and made publicly available through large-scale projects such as The Cancer Genome Atlas (TCGA) proeject. However, the glycomic data collection is constrained by specialized analytical tools that are less accessible by biological researchers. In this paper, we present a Bayesian sparse latent regression (BSLR) model for predicting quantitative glycan abundances from glycotranscriptomic data. The model is built using the matched glycomic and glycotranscriptomic data collected in a same set of samples as training sets, and is then exploited to study the common properties of the training samples and to predict these properties (e.g., the glycan abundances) in similar samples from which only glycotranscriptomc data are available. The BSLR model assumes the glycomic and the glycotranscriptomic abundances are both modulated by a small number of independent latent variables, and thus can be constructed by using only a relatively small number of training samples. When tested on simulated data, we show our approach achieves satisfactory performance using only 10-20 training samples. We also tested our model on five cancer cell lines, and showed the BSLR model can accurately predict the glycan abundances from the transcription levels of glycan synthetic genes. Furthermore, the predicted glycan abundances can distinguish the metastatic cell line specifically targeting brain from the remaining breast cancer cell lines as well as the a brain cancer cell line, with only slightly lower power than the observed glycan abundances in glycomic experiments, indicating the BSLR prediction retains the variations of glycan abundances across different groups of samples from their glycotranscriptomic data.

[1]  Matthew P. Campbell,et al.  Quantitative profiling of glycans and glycopeptides: an informatics' perspective. , 2016, Current opinion in structural biology.

[2]  Pascal J. Goldschmidt-Clermont,et al.  Of mice and men: Sparse statistical modeling in cardiovascular genomics , 2007, 0709.0165.

[3]  P. Gagneux,et al.  Glycomics: revealing the dynamic ecology and evolution of sugar molecules. , 2016, Journal of proteomics.

[4]  K. Ohtsubo,et al.  Disease-associated glycans on cell surface proteins. , 2016, Molecular aspects of medicine.

[5]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[6]  Carlos M. Carvalho,et al.  Sparse Statistical Modelling in Gene Expression Genomics , 2006 .

[7]  L. Mahal,et al.  Mapping posttranscriptional regulation of the human glycome uncovers microRNA defining the glycocode , 2014, Proceedings of the National Academy of Sciences.

[8]  M. West,et al.  Trans-study projection of genomic biomarkers in analysis of oncogene deregulation and breast cancer , 2018, Oxford Handbooks Online.

[9]  G. Wiederschain Glycobiology and human diseases , 2016 .

[10]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[11]  Ajit Varki,et al.  Biological roles of glycans , 2016, Glycobiology.

[12]  Jae-Min Lim,et al.  Regulation of Glycan Structures in Murine Embryonic Stem Cells , 2012, The Journal of Biological Chemistry.

[13]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[14]  Y. Mechref,et al.  Quantitative Glycomics Strategies* , 2013, Molecular & Cellular Proteomics.

[15]  David B Dunson,et al.  Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[16]  Norelle C. Wildburger,et al.  Integrated Transcriptomic and Glycomic Profiling of Glioma Stem Cell Xenografts. , 2015, Journal of proteome research.

[17]  M. West,et al.  A Bayesian Analysis Strategy for Cross-Study Translation of Gene Expression Biomarkers , 2009, Statistical applications in genetics and molecular biology.

[18]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[19]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[20]  Peter H Seeberger,et al.  Glycan Arrays: From Basic Biochemical Research to Bioanalytical and Biomedical Applications. , 2016, Annual review of analytical chemistry.