Drosophila gene expression pattern annotation using sparse features and term-term interactions

The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.

[1]  Jieping Ye,et al.  A bag-of-words approach for Drosophila gene expression pattern annotation , 2009, BMC Bioinformatics.

[2]  Jieping Ye,et al.  Developmental stage annotation of Drosophila gene expression pattern images via an entire solution path for LDA , 2008, TKDD.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[5]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[6]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[7]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[8]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[10]  Andrew Zisserman,et al.  SDL: Supervised Dictionary Learning , 2008, NIPS 2008.

[11]  Charless C. Fowlkes,et al.  A Quantitative Spatiotemporal Atlas of Gene Expression in the Drosophila Blastoderm , 2008, Cell.

[12]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[13]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[14]  Jieping Ye,et al.  Classification of Drosophila embryonic developmental stage range based on gene expression pattern images. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[15]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  P. Tomançak,et al.  Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function , 2007, Cell.

[17]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[19]  Sethuraman Panchanathan,et al.  Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations , 2004, BMC Bioinformatics.

[20]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[21]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[22]  Jieping Ye,et al.  Automated annotation of Drosophila gene expression patterns using a controlled vocabulary , 2008, Bioinform..

[23]  S. Panchanathan,et al.  BEST: a novel computational approach for comparing gene expression patterns from early stages of Drosophila melanogaster development. , 2002, Genetics.

[24]  References , 1971 .

[25]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[27]  Andrew Zisserman,et al.  Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.