Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models

BackgroundWe consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function.ResultsWe present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes.ConclusionsThe classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

[1]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[2]  J. Dickerson,et al.  Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences , 2012, BMC Research Notes.

[3]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[4]  Hiroshi Mamitsuka,et al.  A hidden Markov model-based approach for identifying timing differences in gene expression under different experimental factors , 2007, Bioinform..

[5]  Sean R. Eddy,et al.  Biological sequence analysis: Contents , 1998 .

[6]  Ziv Bar-Joseph,et al.  Alignment and classification of time series gene expression in clinical studies , 2008, ISMB.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Radford M. Neal,et al.  Multiple Alignment of Continuous Time Series , 2004, NIPS.

[9]  Christina Kendziorski,et al.  Hidden Markov Models for Microarray Time Course Data in Multiple Biological Conditions , 2006 .

[10]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[11]  Michael Seifert,et al.  Autoregressive Higher-Order Hidden Markov Models: Exploiting Local Chromosomal Dependencies in the Analysis of Tumor Expression Profiles , 2014, PloS one.

[12]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[13]  Alexander Schliep,et al.  Analyzing Gene Expression Time-Courses , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..