Curve Profiling Feature: Novel Compact Representation for Drosophila Embryonic Gene Expression Pattern Mining

Curve Profiling Feature (CPF) is an innovative compact and discriminative feature for representing and mining the temporal-spatial patterns underlying Drosophila embryonic gene expressions from the Berkeley Drosophila Genome Project (BDGP) in situ hybridization (ISH) database. CPF is calibration-free, unaffected by differences in individual embryonic size or shape, biologically inspired, and can significantly reduce data dimensionality. Moreover, CPF can identify spatial periodic patterns – a nontrivial concern by previous methods. Quantitative evaluations by controlled vocabulary annotation prediction and gene function enrichment with Gene Ontology knowledge base showed that our CPF achieves comparable performance as state-of-the-art Bag-Of-Words model while requires much less space and time. Application systems are also proposed to help biologists in different aspects including predicting annotations and gene functional enrichment, visualization based on manifold learning, content-based gene expression pattern retrieval with synthesized query.

[1]  Alexander Schliep,et al.  Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data , 2007, BMC Bioinformatics.

[2]  Sethuraman Panchanathan,et al.  Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations , 2004, BMC Bioinformatics.

[3]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[4]  E. Myers,et al.  Automatic image analysis for gene expression patterns of fly embryos , 2007, BMC Cell Biology.

[5]  V A McKusick,et al.  Genomics and medicine. Dissecting human disease in the postgenomic era. , 2001, Science.

[6]  Sonja J. Prohaska,et al.  Automatic Classification of Embryonic Fruit Fly Gene Expression Patterns , 2009, Bildverarbeitung für die Medizin.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  S. Panchanathan,et al.  BEST: a novel computational approach for comparing gene expression patterns from early stages of Drosophila melanogaster development. , 2002, Genetics.

[9]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[10]  Jieping Ye,et al.  A bag-of-words approach for Drosophila gene expression pattern annotation , 2009, BMC Bioinformatics.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Mário J. Silva,et al.  Finding genomic ontology terms in text using evidence content , 2005, BMC Bioinformatics.

[13]  Jingyuan Deng,et al.  Probing intrinsic properties of a robust morphogen gradient in Drosophila. , 2008, Developmental cell.

[14]  Leena Peltonen,et al.  Dissecting Human Disease in the Postgenomic Era , 2001, Science.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[17]  Christos Faloutsos,et al.  Automatic mining of fruit fly embryo images , 2006, KDD '06.