Hierarchical Ensemble Methods for Protein Function Prediction

Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Giorgio Valentini,et al.  Scalable Network-based Learning Methods for Automated Function Prediction based on the Neo 4 j Graph-database , 2013 .

[3]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[5]  Giorgio Valentini,et al.  Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction , 2010, MLSB.

[6]  A. Sokolov A Structured-Outputs Method for Prediction of Protein Function , 2008 .

[7]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[8]  Yong Zhang,et al.  A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets , 2013 .

[9]  E. Marcotte,et al.  Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana , 2010, Nature Biotechnology.

[10]  David Manset,et al.  XML-based approaches for the integration of heterogeneous bio-molecular data , 2009, BMC Bioinformatics.

[11]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[12]  Karin M. Verspoor,et al.  A categorization approach to automated ontological function annotation , 2006, Protein science : a publication of the Protein Society.

[13]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[14]  Giorgio Valentini,et al.  A Novel Ensemble Technique for Protein Subcellular Location Prediction , 2011, Ensembles in Machine Learning Applications.

[15]  Alex Alves Freitas,et al.  A Hierarchical Classification Ant Colony Algorithm for Predicting Gene Ontology Terms , 2009, EvoBIO.

[16]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[17]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[18]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[19]  Juho Rousu,et al.  Towards structured output prediction of enzyme function , 2008, BMC proceedings.

[20]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[21]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[22]  Giorgio Valentini,et al.  Cancer module genes ranking using kernelized score functions , 2012, BMC Bioinformatics.

[23]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[24]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[25]  Claudio Gentile,et al.  Hierarchical classification: combining Bayes with SVM , 2006, ICML.

[26]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  New top-down methods using SVMs for Hierarchical Multilabel Classification problems , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[27]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[29]  G. Valentini,et al.  Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction , 2009 .

[30]  Christopher DeCoro,et al.  Hierarchical Shape Classification Using Bayesian Aggregation , 2006, IEEE International Conference on Shape Modeling and Applications 2006 (SMI'06).

[31]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[32]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[33]  Yiannis Kourmpetis,et al.  Bayesian Markov Random Field Analysis for Protein Function Prediction Based on Network Data , 2010, PloS one.

[34]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[35]  Zheng Guo,et al.  Broadly predicting specific gene functions with expression similarity and taxonomy similarity. , 2005, Gene.

[36]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ram Samudrala,et al.  Functional annotation from predicted protein interaction networks , 2005, Bioinform..

[38]  Dao-Qing Dai,et al.  A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical Multilabel Protein Function Prediction Using Local Neural Networks , 2011, BSB.

[40]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[41]  Zhiwen Yu,et al.  Protein Function Prediction with Incomplete Annotations , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[43]  Nicolò Cesa-Bianchi,et al.  Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference , 2012, Machine Learning.

[44]  Giorgio Valentini,et al.  A Fast Ranking Algorithm for Predicting Gene Functions in Biomolecular Networks , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  Giorgio Valentini,et al.  Large Scale Ranking and Repositioning of Drugs with Respect to DrugBank Therapeutic Categories , 2012, ISBRA.

[46]  Olga G. Troyanskaya,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm332 Data and text mining , 2022 .

[47]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[48]  Benhui Chen,et al.  Hierarchical multi‐label classification based on over‐sampling and hierarchy constraint for gene function prediction , 2012 .

[49]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[50]  Giorgio Valentini,et al.  COSNet: A Cost Sensitive Neural Network for Semi-supervised Learning in Graphs , 2011, ECML/PKDD.

[51]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.

[52]  Weidong Tian,et al.  Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function , 2008, Genome Biology.

[53]  Giorgio Valentini,et al.  A neural network algorithm for semi-supervised node label learning from unbalanced data , 2013, Neural Networks.

[54]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[55]  Asa Ben-Hur,et al.  Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method , 2010, J. Bioinform. Comput. Biol..

[56]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[57]  Alex Alves Freitas,et al.  A grammatical evolution algorithm for generation of Hierarchical Multi-Label Classification rules , 2013, 2013 IEEE Congress on Evolutionary Computation.

[58]  María S. Pérez-Hernández,et al.  Bayesian network multi-classifiers for protein secondary structure prediction , 2004, Artif. Intell. Medicine.

[59]  Alex Alves Freitas,et al.  On the Importance of Comprehensible Classification Models for Protein Function Prediction , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[60]  Michael I. Jordan,et al.  Consistent probabilistic outputs for protein function prediction , 2008, Genome Biology.

[61]  Christian Schaefer,et al.  Homology-based inference sets the bar high for protein function prediction , 2013, BMC Bioinformatics.

[62]  Giorgio Valentini Mosclust: a software library for discovering significant structures in bio-molecular data , 2007, Bioinform..

[63]  Saso Dzeroski,et al.  Tree ensembles for predicting structured outputs , 2013, Pattern Recognit..

[64]  Giorgio Valentini,et al.  Network-Based Drug Ranking and Repositioning with Respect to DrugBank Therapeutic Categories , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[65]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[66]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[67]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[68]  Ni Li,et al.  Gene Ontology Annotations and Resources , 2012, Nucleic Acids Res..

[69]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[70]  William Stafford Noble,et al.  Choosing negative examples for the prediction of protein-protein interactions , 2006, BMC Bioinformatics.

[71]  K. Bretonnel Cohen,et al.  Ontology quality assurance through analysis of term transformations , 2009, Bioinform..

[72]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[73]  Farshad Fotouhi,et al.  Exploiting Label Dependency for Hierarchical Multi-label Classification , 2012, PAKDD.

[74]  Giorgio Valentini,et al.  Ensemble methods : a review , 2012 .

[75]  S. Y. Sohn,et al.  Experimental study for the comparison of classifier combination methods , 2007, Pattern Recognit..

[76]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[77]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[78]  Vipin Kumar,et al.  Incorporating functional inter-relationships into protein function prediction algorithms , 2009, BMC Bioinformatics.

[79]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[80]  Michelangelo Ceci,et al.  Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction , 2013, BMC Bioinformatics.

[81]  Zili Zhang,et al.  Protein Function Prediction by Integrating Multiple Kernels , 2013, IJCAI.

[82]  Júlio C. Nievola,et al.  Hierarchical classification using a Competitive Neural Network , 2012, 2012 8th International Conference on Natural Computation.

[83]  K. Dembczynski,et al.  On Label Dependence in Multi-Label Classification , 2010 .

[84]  Karin M. Verspoor,et al.  Combining heterogeneous data sources for accurate functional annotation of proteins , 2013, BMC Bioinformatics.

[85]  ZhouZhi-Hua,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006 .

[86]  Paolo Fontana,et al.  Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms , 2012, BMC Bioinformatics.

[87]  Giorgio Valentini,et al.  Ensembles Based on Random Projections to Improve the Accuracy of Clustering Algorithms , 2005, WIRN/NAIS.

[88]  Giorgio Valentini,et al.  Ensemble Based Data Fusion for Gene Function Prediction , 2009, MCS.

[89]  H. Mewes,et al.  Overview of the yeast genome. , 1997, Nature.

[90]  Gunnar Rätsch,et al.  Multitask Learning in Computational Biology , 2012, ICML Unsupervised and Transfer Learning.

[91]  Limsoon Wong,et al.  An efficient strategy for extensive integration of diverse biological data for protein function prediction , 2007, Bioinform..

[92]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[93]  Luis Enrique Sucar,et al.  A Hybrid Global-Local Approach for Hierarchical Classification , 2013, FLAIRS.

[94]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[95]  J. Friedman,et al.  On bagging and nonlinear estimation , 2007 .

[96]  Alexander Lerch,et al.  A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[97]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[98]  Olga G. Troyanskaya,et al.  The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction , 2009, Bioinform..

[99]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[100]  Dennis Shasha,et al.  Parametric Bayesian priors and better choice of negative examples improve protein function prediction , 2013, Bioinform..

[101]  Giorgio Valentini,et al.  Cancer recognition with bagged ensembles of support vector machines , 2004, Neurocomputing.

[102]  H. D. Brunk,et al.  The Isotonic Regression Problem and its Dual , 1972 .

[103]  Giorgio Valentini,et al.  Ensembles in Machine Learning Applications , 2011, Studies in Computational Intelligence.

[104]  Purvesh Khatri,et al.  Predicting Novel Human Gene Ontology Annotations Using Semantic Analysis , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[105]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[106]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[107]  Noel E. Sharkey,et al.  The "Test and Select" Approach to Ensemble Combination , 2000, Multiple Classifier Systems.

[108]  Jinglu Hu,et al.  Composite kernel based SVM for hierarchical multi-label gene function classification , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[109]  Robert D. Finn,et al.  Integrating sequence and structural biology with DAS , 2007, BMC Bioinformatics.

[110]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[111]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[112]  Giorgio Valentini,et al.  Noise tolerance of Multiple Classifier Systems in data integration-based gene function prediction , 2010, J. Integr. Bioinform..

[113]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[114]  Zhiwen Yu,et al.  Transductive multi-label ensemble classification for protein function prediction , 2012, KDD.

[115]  Stan Matwin,et al.  Hierarchical Text Categorization as a Tool of Associating Genes with Gene Ontology Codes , 2004 .

[116]  M. Tress,et al.  Sequence-based feature prediction and annotation of proteins , 2009, Genome Biology.

[117]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[118]  Yves Chauvin,et al.  Backpropagation: the basic theory , 1995 .

[119]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[120]  Eugene M. Kleinberg,et al.  On the Algorithmic Implementation of Stochastic Discrimination , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[121]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[122]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[123]  Giorgio Valentini,et al.  Discovering multi–level structures in bio-molecular data through the Bernstein inequality , 2008, BMC Bioinformatics.

[124]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[125]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[126]  Nicolò Cesa-Bianchi,et al.  Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction , 2009, MLSB.

[127]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[128]  W. Kim,et al.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy , 2008, Genome Biology.

[129]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[130]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles , 2009, MCS.

[131]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[132]  Duane Szafron,et al.  Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[133]  Quaid Morris,et al.  Using the Gene Ontology Hierarchy when Predicting Gene Function , 2009, UAI.

[134]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[135]  Wei Pan,et al.  Large Margin Hierarchical Classification with Mutually Exclusive Class Membership , 2011, J. Mach. Learn. Res..

[136]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[137]  Eyke Hüllermeier,et al.  On label dependence in multilabel classification , 2010, ICML 2010.

[138]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[139]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[140]  Babak Shahbaba,et al.  Gene function classification using Bayesian models with hierarchy-based priors , 2006, BMC Bioinformatics.

[141]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[142]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[143]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[144]  Giorgio Valentini,et al.  Prediction of Gene Function Using Ensembles of SVMs and Heterogeneous Data Sources , 2009, Applications of Supervised and Unsupervised Ensemble Methods.

[145]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[146]  Saso Dzeroski,et al.  Hierarchical multilabel classification trees for gene function prediction (Extended abstract) , 2006 .

[147]  Joydeep Ghosh,et al.  Enhanced hierarchical classification via isotonic smoothing , 2008, WWW.

[148]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical Multilabel Classification Using Top-Down Label Combination and Artificial Neural Networks , 2010, 2010 Eleventh Brazilian Symposium on Neural Networks.

[149]  Yinghui Li,et al.  Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration , 2006, BMC Bioinformatics.

[150]  William Stafford Noble,et al.  Integrating Information for Protein Function Prediction , 2008 .

[151]  Oleg Burdakov,et al.  An O(n2) algorithm for isotonic regression , 2006 .

[152]  D. Titterington,et al.  Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients , 1981 .

[153]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[154]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[155]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[156]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[157]  Quaid Morris,et al.  Fast integration of heterogeneous data sources for predicting gene function with limited annotation , 2010, Bioinform..

[158]  Amanda Clare,et al.  The utility of different representations of protein sequence for predicting functional class , 2001, Bioinform..

[159]  Xiaoyu Jiang,et al.  Integration of relational and hierarchical network information for protein function prediction , 2008, BMC Bioinformatics.

[160]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[161]  Daniel W. A. Buchan,et al.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources , 2013, BMC Bioinformatics.

[162]  Saso Dzeroski,et al.  Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[163]  Weiqiang Wang,et al.  Predicting Gene Ontology functions based on support vector machines and statistical significance estimation , 2007, Neurocomputing.

[164]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[165]  Christophe Dessimoz,et al.  Quality of Computationally Inferred Gene Ontology Annotations , 2012, PLoS Comput. Biol..

[166]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.

[167]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[168]  Juan Miguel García-Gómez,et al.  Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research , 2005, Bioinform..

[169]  Amanda Clare,et al.  Predicting gene function in Saccharomyces cerevisiae , 2003, ECCB.

[170]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[171]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[172]  Michal Linial,et al.  The Advantage of Functional Prediction Based on Clustering of Yeast Genes and Its Correlation with Non-Sequence Based Classifications , 2002, J. Comput. Biol..

[173]  M. Riley,et al.  Functions of the gene products of Escherichia coli , 1993, Microbiological reviews.

[174]  Saso Dzeroski,et al.  Predicting gene function using hierarchical multi-label decision tree ensembles , 2010, BMC Bioinformatics.

[175]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[176]  Yiannis Kompatsiaris,et al.  An Empirical Study of Multi-label Learning Methods for Video Annotation , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[177]  Cajo J. F. ter Braak,et al.  Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes , 2013, Algorithms for Molecular Biology.

[178]  G. Valentini,et al.  Functional Inference in FunCat through the Combination of Hierarchical Ensembles with Data Fusion Methods , 2010 .

[179]  Giorgio Valentini,et al.  An Experimental Comparison of Hierarchical Bayes and True Path Rule Ensembles for Protein Function Prediction , 2010, MCS.

[180]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[181]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[182]  Nicolas de Condorcet Essai Sur L'Application de L'Analyse a la Probabilite Des Decisions Rendues a la Pluralite Des Voix , 2009 .

[183]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[184]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[185]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[186]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[187]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[188]  Michelangelo Ceci,et al.  Classifying web documents in a hierarchy of categories: a comprehensive study , 2007, Journal of Intelligent Information Systems.

[189]  Ambuj K. Singh,et al.  Molecular Function Prediction Using Neighborhood Features , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[190]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[191]  Robert Stevens,et al.  Gene Ontology Consortium , 2014 .

[192]  Claudio Gentile,et al.  Random Spanning Trees and the Prediction of Weighted Graphs , 2010, ICML.

[193]  Christoph H. Lampert,et al.  Structured prediction by joint kernel support estimation , 2009, Machine Learning.

[194]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[195]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[196]  Nicolò Cesa-Bianchi,et al.  HCGene: a software tool to support the hierarchical classification of genes , 2008, Bioinform..

[197]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[198]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.

[199]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[200]  Peter D. Karp,et al.  Prediction of Enzyme Classification from Protein Sequence without the Use of Sequence Similarity , 1997, ISMB.

[201]  Giorgio Valentini,et al.  Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines , 2010, Neurocomputing.

[202]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[203]  J. Jośe A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[204]  Alex Alves Freitas,et al.  Improving Local Per Level Hierarchical Classification , 2012, J. Inf. Data Manag..

[205]  Jan Komorowski,et al.  Predicting gene ontology biological process from temporal gene expression patterns. , 2003, Genome research.

[206]  Christopher DeCoro,et al.  Bayesian Aggregation for Hierarchical Genre Classification , 2007, ISMIR.

[207]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.