Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields

Motivation: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. Methods: We describe a discriminative undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a non-parametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy incomplete samples, i.e. it can tolerate data missing from individual time points. Results: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared with previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages. Contact: uwe.ohler@duke.edu

[1]  Weiping Zhang,et al.  Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images , 2010, Bioinform..

[2]  Charless C. Fowlkes,et al.  A Quantitative Spatiotemporal Atlas of Gene Expression in the Drosophila Blastoderm , 2008, Cell.

[3]  E. Frise,et al.  Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape , 2010, Molecular systems biology.

[4]  Charless C. Fowlkes,et al.  Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution II: dynamics , 2006, Genome Biology.

[5]  Charless C. Fowlkes,et al.  3d morphology and gene expression in the drosophila blastoderm at cellular resolution , 2006 .

[6]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[7]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[8]  Wolfgang Busch,et al.  A microfluidic device and computational platform for high throughput live imaging of gene expression , 2012, Nature Methods.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Jitendra Malik,et al.  Registering Drosophila embryos at cellular resolution to build a quantitative 3D atlas of gene expression patterns and morphology , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[11]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[12]  Johannes E. Schindelin,et al.  Fiji: an open-source platform for biological-image analysis , 2012, Nature Methods.

[13]  Prof. Dr. José A. Campos-Ortega,et al.  The Embryonic Development of Drosophila melanogaster , 1997, Springer Berlin Heidelberg.

[14]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[15]  Jonathan Warrell,et al.  Tied Factor Analysis for Face Recognition across Large Pose Differences , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  D. Tautz,et al.  A non-radioactive in situ hybridization method for the localization of specific RNAs in Drosophila embryos reveals translational control of the segmentation gene hunchback , 1989, Chromosoma.

[17]  S. Panchanathan,et al.  BEST: a novel computational approach for comparing gene expression patterns from early stages of Drosophila melanogaster development. , 2002, Genetics.

[18]  S. Shankar Sastry,et al.  Comparative Analysis of Spatial Patterns of Gene Expression in Drosophila melanogaster Imaginal Discs , 2007, RECOMB.

[19]  Eugene W. Myers,et al.  Comparing in situ mRNA expression patterns of drosophila embryos , 2004, RECOMB.

[20]  A. J. Schroeder,et al.  The FlyBase database of the Drosophila Genome Projects and community literature. , 2002, Nucleic acids research.

[21]  Uwe Ohler,et al.  Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models , 2011, PLoS Comput. Biol..

[22]  Christos Faloutsos,et al.  SPEX2: automated concise extraction of spatial gene expression patterns from Fly embryo ISH images , 2010, Bioinform..

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  The FlyBase database of the Drosophila genome projects and community literature. , 2003, Nucleic acids research.

[25]  Jieping Ye,et al.  Automated annotation of Drosophila gene expression patterns using a controlled vocabulary , 2008, Bioinform..

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Anne E Carpenter,et al.  Visualization of image data from cells to organisms , 2010, Nature Methods.

[28]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[29]  Anne E Carpenter,et al.  Introduction to the Quantitative Analysis of Two-Dimensional Fluorescence Microscopy Images for Cell-Based Screening , 2009, PLoS Comput. Biol..

[30]  D. Dunson,et al.  Sparse Bayesian infinite factor models. , 2011, Biometrika.

[31]  Jieping Ye,et al.  A bag-of-words approach for Drosophila gene expression pattern annotation , 2009, BMC Bioinformatics.

[32]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[33]  R. Darlington,et al.  Factor Analysis , 2008 .

[34]  E. Myers,et al.  Automatic image analysis for gene expression patterns of fly embryos , 2007, BMC Cell Biology.