FlyIT: Drosophila Embryogenesis Image Annotation based on Image Tiling and Convolutional Neural Networks

With the rise of image-based transcriptomics, spatial gene expression data has become increasingly important for understanding gene regulations from the tissue level down to the cell level. Especially, the gene expression images of Drosophila embryos provide a new data source in the study of Drosophila embryogenesis. It is imperative to develop automatic annotation tools since manual annotation is labor-intensive and requires professional knowledge. Although a lot of image annotation methods have been proposed in the computer vision field, they may not work well for gene expression images, due to the great difference between these two annotation tasks. Besides the apparent difference on images, the annotation is performed at the gene level rather than the image level, where the expression patterns of a gene are recorded in multiple images. Moreover, the annotation terms often correspond to local expression patterns of images, yet they are assigned collectively to groups of images and the relations between the terms and single images are unknown. In order to learn the spatial expression patterns comprehensively for genes, we propose a new method, called FlyIT (image annotation based on Image Tiling and convolutional neural networks for fruit Fly). We implement two versions of FlyIT, learning at image-level and gene-level respectively. The gene-level version employs an image tiling strategy to get a combined image feature representation for each gene. FlyIT uses a pre-trained ResNet model to obtain feature representation and a new loss function to deal with the class imbalance problem. As the annotation of Drosophila images is a multi-label classification problem, the new loss function considers the difficulty levels for recognizing different labels of the same sample and adjusts the sample weights accordingly. The experimental results on the FlyExpress database show that both the image tiling strategy and the deep architecture lead to the great enhancement of the annotation performance. FlyIT outperforms the existing annotators by a large margin (over 9% on AUC and 12% on macro F1 for predicting the top 10 terms). It also shows advantages over other deep learning models, including both single-instance and multi-instance learning frameworks.

[1]  Sethuraman Panchanathan,et al.  FlyExpress: visual mining of spatiotemporal patterns for genes and publications in Drosophila embryogenesis , 2011, Bioinform..

[2]  Jieping Ye,et al.  Drosophila gene expression pattern annotation using sparse features and term-term interactions , 2009, KDD.

[3]  Zhi-Hua Zhou Multi-Instance Learning : A Survey , 2004 .

[4]  Jieping Ye,et al.  Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Yann Chevaleyre,et al.  Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem , 2001, Canadian Conference on AI.

[6]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[9]  Jieping Ye,et al.  Automated annotation of Drosophila gene expression patterns using a controlled vocabulary , 2008, Bioinform..

[10]  E. Frise,et al.  Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape , 2010, Molecular systems biology.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  B. Dickson,et al.  Genome-scale functional characterization of Drosophila developmental enhancers in vivo , 2014, Nature.

[13]  Lucas Pelkmans,et al.  Image-based transcriptomics in thousands of single human cells at single-molecule resolution , 2013, Nature Methods.

[14]  Jieping Ye,et al.  A bag-of-words approach for Drosophila gene expression pattern annotation , 2009, BMC Bioinformatics.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Zhi-Hua Zhou,et al.  Neural Networks for Multi-Instance Learning , 2002 .

[17]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[18]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Eugene W. Myers,et al.  Clustering gene expression patterns of fly embryos , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[20]  Jieping Ye,et al.  Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval , 2012, BMC Bioinformatics.

[21]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jieping Ye,et al.  Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis , 2015, IEEE Transactions on Big Data.

[23]  Jieping Ye,et al.  Image-level and group-level models for Drosophila gene expression pattern annotation , 2013, BMC Bioinformatics.

[24]  Richard Weiszmann,et al.  Determination of gene expression patterns using high-throughput RNA in situ hybridization to whole-mount Drosophila embryos , 2009, Nature Protocols.

[25]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[26]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hanchuan Peng,et al.  Automatic recognition and annotation of gene expression patterns of fly embryos , 2007, Bioinform..

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  A. Maćkiewicz,et al.  Principal Components Analysis (PCA) , 1993 .

[32]  Zhi-Hua Zhou,et al.  Improve Multi-Instance Neural Networks through Feature Selection , 2004, Neural Processing Letters.