Automatically identifying and annotating mouse embryo gene expression patterns

Motivation: Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene expression in mouse embryo provides a powerful resource to discover the biological function of embryo organization. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable, and inevitably errors arise from the tedious nature of the task. In this article, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms. Results: The method takes images from in situ hybridization studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images. We evaluate our method on image data from the EURExpress study, where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70% and 80% with few exceptions. We show that other known methods have lower classification performance than ours. We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise. Availability: The annotation result and the experimental dataset in the article can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/. Contact: l.han@mmu.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.

[1]  Nicholas Burton,et al.  EMAGE: a spatial database of gene expression patterns during mouse embryo development , 2005, Nucleic Acids Res..

[2]  Peter Schröder,et al.  Wavelets in computer graphics , 1996, Proc. IEEE.

[3]  Christos Faloutsos,et al.  Automatic mining of fruit fly embryo images , 2006, KDD '06.

[4]  Axel Visel,et al.  A gene expression map of the mouse brain. Genepaint.org-A database of gene expression patterns. , 2003 .

[5]  Jieping Ye,et al.  Automated annotation of Drosophila gene expression patterns using a controlled vocabulary , 2008, Bioinform..

[6]  S. Mallat A wavelet tour of signal processing , 1998 .

[7]  Allan R. Jones,et al.  Genome-wide atlas of gene expression in the adult mouse brain , 2007, Nature.

[8]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[9]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[10]  Gregory A. Baxes,et al.  Digital image processing - principles and applications , 1994 .

[11]  Mark Gahegan,et al.  Neural network architectures for the classification of temporal image sequences , 1996 .

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  Nicholas Burton,et al.  EMAP and EMAGE - A framework for understanding spatially organized data , 2003, Neuroinformatics.

[14]  David R. Musicant,et al.  Lagrangian Support Vector Machines , 2001, J. Mach. Learn. Res..

[15]  R. Drysdale FlyBase : a database for the Drosophila research community. , 2008, Methods in molecular biology.

[16]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[17]  Victor B. Strelets,et al.  FlyBase: anatomical data, images and queries , 2005, Nucleic Acids Res..

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[20]  S. Shankar Sastry,et al.  Comparative Analysis of Spatial Patterns of Gene Expression in Drosophila melanogaster Imaginal Discs , 2007, RECOMB.

[21]  Weiping Zhang,et al.  Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images , 2010, Bioinform..

[22]  Hanchuan Peng,et al.  Automatic recognition and annotation of gene expression patterns of fly embryos , 2007, Bioinform..

[23]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[24]  Tao Ju,et al.  A Digital Atlas to Characterize the Mouse Brain Transcriptome , 2005, PLoS Comput. Biol..

[25]  David Salesin,et al.  Wavelets for computer graphics: a primer. 2 , 1995, IEEE Computer Graphics and Applications.

[26]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[27]  Fei Chen,et al.  Nucleolin links to arsenic-induced stabilization of GADD45α mRNA , 2006, Nucleic acids research.