Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster

Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

[1]  Tetsuya Kojima,et al.  The mechanism of Drosophila leg development along the proximodistal axis , 2004, Development, growth & differentiation.

[2]  Alfonso Valencia,et al.  Most highly expressed protein-coding genes have a single dominant isoform. , 2015, Journal of proteome research.

[3]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  P. Bork,et al.  Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis , 2007, Molecular systems biology.

[6]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[7]  Daniel W. A. Buchan,et al.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources , 2013, BMC Bioinformatics.

[8]  Christine A. Orengo,et al.  FFPred: an integrated feature-based function prediction server for vertebrate proteomes , 2008, Nucleic Acids Res..

[9]  David T Jones,et al.  Computational Methods for Annotation Transfers from Sequence. , 2016, Methods in molecular biology.

[10]  M. Poo,et al.  Dissociated neurons from normal and mutant Drosophila larval central nervous system in cell culture , 1983, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[11]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[12]  Kevin P White,et al.  Tissue-specific gene expression and ecdysone-regulated genomic networks in Drosophila. , 2003, Developmental cell.

[13]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[14]  C. Rickert,et al.  The embryonic central nervous system lineages of Drosophila melanogaster. II. Neuroblast lineages derived from the dorsal part of the neuroectoderm. , 1996, Developmental biology.

[15]  W. Chia,et al.  Formation of neuroblasts in the embryonic central nervous system of Drosophila melanogaster is controlled by SoxNeuro. , 2002, Development.

[16]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[17]  C Q Doe,et al.  Dorsoventral patterning in the Drosophila central nervous system: the intermediate neuroblasts defective homeobox gene specifies intermediate column identity. , 1998, Genes & development.

[18]  Hannah Currant,et al.  FFPred 3: feature-based function prediction for all Gene Ontology domains , 2016, Scientific Reports.

[19]  Yeisoo Yu,et al.  Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing , 2015, BMC Genomics.

[20]  Mehmet M. Dalkilic,et al.  Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function , 2009, Genome Biology.

[21]  Christine A. Orengo,et al.  Identifying and characterising key alternative splicing events in Drosophila development , 2015, BMC Genomics.

[22]  J. Dow,et al.  Using FlyAtlas to identify better Drosophila melanogaster models of human disease , 2007, Nature Genetics.

[23]  Markus Friedrich,et al.  Evolution of Insect Eye Development: First Insights from Fruit Fly, Grasshopper and Flour Beetle1 , 2003, Integrative and comparative biology.

[24]  V. Hartenstein,et al.  Dpp and Hh signaling in the Drosophila embryonic eye field. , 2001, Development.

[25]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  M. Krasnow,et al.  branchless Encodes a Drosophila FGF Homolog That Controls Tracheal Cell Migration and the Pattern of Branching , 1996, Cell.

[28]  D. Montell,et al.  The genetics of cell migration in Drosophila melanogaster and Caenorhabditis elegans development. , 1999, Development.

[29]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[32]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[33]  Eric Johnson,et al.  Developmental Control of Blood Cell Migration by the Drosophila VEGF Pathway , 2002, Cell.

[34]  C. Rickert,et al.  The Embryonic Central Nervous System Lineages ofDrosophila melanogaster , 1996 .

[35]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[36]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[37]  Ross Cagan,et al.  Principles of Drosophila eye differentiation. , 2009, Current topics in developmental biology.

[38]  C Q Doe,et al.  The embryonic central nervous system lineages of Drosophila melanogaster. I. Neuroblast lineages derived from the ventral half of the neuroectoderm. , 1996, Developmental biology.

[39]  Damiano Piovesan,et al.  FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences , 2013, PloS one.

[40]  J. Truman,et al.  Metamorphosis of the central nervous system of Drosophila. , 1990, Journal of neurobiology.

[41]  Karin M. Verspoor,et al.  Combining heterogeneous data sources for accurate functional annotation of proteins , 2013, BMC Bioinformatics.

[42]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[43]  M Wilcox,et al.  The distribution of PS integrins, laminin A and F-actin during key stages in Drosophila wing development. , 1993, Development.

[44]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[45]  W. Grueber,et al.  Development of the embryonic and larval peripheral nervous system of Drosophila , 2014, Wiley interdisciplinary reviews. Developmental biology.