Bayesian state-space modeling in gene expression data analysis: An application with biomarker prediction.

Background and ObjectiveBayesian State Space models are recent advancement in stochastic modeling which capture the randomness of a hidden background process by scrutinizing the prior knowledge and likelihood of observed data. This article elucidate the scope of Bayesian state space modeling on predicting the future expression values of a longitudinal micro array data. MethodsThe study conveniently makes use of longitudinally collected clinical trial data (GSE30531) from NCBI Gene Expression Omnibus (GEO) data repository. Multiple testing methodology using t-test is used for selecting differentially expressed genes between groups for fitting the model. The parameter values of the predictive model and future expression levels are estimated by drawing samples from the posterior joint distribution using a stochastic Markov Chain Monte Carlo (MCMC) algorithm which relies on Gibbs Sampling. The study also made an attempt to get estimates and its 95% Credible Interval through assumptions of different covariance structures like Variance Components, First order Auto Regressive and Unstructured variance-covariance structure to showcase the flexibility of the algorithm. Results72 Distinct genes with significantly different expression levels where selected for model fitting. Parameter estimates showed almost similar trends under different covariance structure assumption. Cross tabulation of gene frequencies having minimum credible interval under each covariance structure and study group showed a significant P value of 0.02. ConclusionsPresent study reveals that Bayesian state space models can be effectively used to explain and predict a complex data like gene expression data.

[1]  I. Hoeschele,et al.  A note on joint versus gene-specific mixed model analysis of microarray gene expression data. , 2005, Biostatistics.

[2]  H. Chung,et al.  Ovarian Cancer Biomarker Discovery Based on Genomic Approaches , 2013, Journal of cancer prevention.

[3]  J. Geweke,et al.  Bayesian estimation of state-space models using the Metropolis-Hastings algorithm within Gibbs sampling , 2001 .

[4]  P. François,et al.  Introduction to Microarray‐Based Detection Methods , 2009 .

[5]  E. Marengo,et al.  Biomarkers for pancreatic cancer: recent achievements in proteomics and genomics through classical and multivariate statistical methods. , 2014, World journal of gastroenterology.

[6]  Effect of Correlation Structure in Generalized Estimating Equation and Quasi Least Square: An Application in Type 2 Diabetes Patient , 2011 .

[7]  Nicholas G. Polson,et al.  A Monte Carlo Approach to Nonnormal and Nonlinear State-Space Modeling , 1992 .

[8]  Arie Perry,et al.  Mantel statistics to correlate gene expression levels from microarrays with clinical covariates , 2002, Genetic epidemiology.

[9]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[10]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[11]  Hulin Wu,et al.  Mixed‐Effects State‐Space Models for Analysis of Longitudinal Dynamic Systems , 2011, Biometrics.

[12]  Jon T. Schnute,et al.  A General Framework for Developing Sequential Fisheries Models , 1994 .

[13]  P. Wade Bayesian Methods in Conservation Biology , 2000 .

[14]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[15]  Shigeyuki Matsui,et al.  Genomic Biomarkers for Personalized Medicine: Development and Validation in Clinical Studies , 2013, Comput. Math. Methods Medicine.

[16]  Russell B. Millar,et al.  Bayesian stock assessment using a state-space implementation of the delay difference model , 1999 .

[17]  Nozer D. Singpurwalla,et al.  Understanding the Kalman Filter , 1983 .

[18]  Kaushik Ghosh,et al.  A New Method of Predicting US and State‐Level Cancer Mortality Counts for the Current Calendar Year , 2004, CA: a cancer journal for clinicians.

[19]  Gael M. Martin,et al.  Parameterisation and efficient MCMC estimation of non-Gaussian state space models , 2008, Comput. Stat. Data Anal..

[20]  S Richardson,et al.  Modeling Markers of Disease Progression by a Hidden Markov Process: Application to Characterizing CD4 Cell Decline , 2000, Biometrics.

[21]  Robert Denham,et al.  Efficient Bayesian estimation of multivariate state space models , 2009, Comput. Stat. Data Anal..

[22]  Dipak K. Dey,et al.  State space mixed models for binary responses with scale mixture of normal distributions links , 2014, Comput. Stat. Data Anal..

[23]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[24]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[25]  W. Kuo,et al.  Associations between gene expressions in breast cancer and patient survival , 2002, Human Genetics.

[26]  Christine Solbach,et al.  Identification of high risk breast-cancer patients by gene expression profiling , 2002, The Lancet.

[27]  Philip D O'Neill,et al.  A tutorial introduction to Bayesian inference for stochastic epidemic models using Markov chain Monte Carlo methods. , 2002, Mathematical biosciences.

[28]  E. Lightcap,et al.  Quantitative Proteomic Analysis of Cellular Protein Modulation upon Inhibition of the NEDD8-Activating Enzyme by MLN4924 , 2011, Molecular & Cellular Proteomics.

[29]  Yang Xie,et al.  A NONPARAMETRIC EMPIRICAL BAYES APPROACH TO JOINT MODELING OF MULTIPLE SOURCES OF GENOMIC DATA , 2008 .

[30]  Bing Zhang,et al.  An Integrated Approach for the Analysis of Biological Pathways using Mixed Models , 2008, PLoS genetics.

[31]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[32]  Ramon C. Littell,et al.  TUTORIAL IN BIOSTATISTICS: MODELLING COVARIANCE STRUCTURE IN THE ANALYSIS OF REPEATED MEASURES DATA , 2000 .

[33]  Tsuyoshi Kunihama,et al.  Generalized extreme value distribution with time-dependence using the AR and MA models in state space form , 2012, Comput. Stat. Data Anal..

[34]  Yann Guédon,et al.  Exploring the state sequence space for hidden Markov and semi-Markov chains , 2007, Comput. Stat. Data Anal..

[35]  Kenneth H. Reckhow,et al.  Bayesian inference in non-replicated ecological studies , 1990 .

[36]  Wei Pan,et al.  A Bayesian approach to joint modeling of protein–DNA binding, gene expression and sequence data , 2010, Statistics in medicine.