An Extended Local Hierarchical Classifier for Prediction of Protein and Gene Functions

Gene function prediction and protein function prediction are complex classification problems where the functional classes are structured according to a predefined hierarchy. To solve these problems, we propose an extended local hierarchical Naive Bayes classifier, where a binary classifier is built for each class in the hierarchy. The extension to conventional local approaches is that each classifier considers both the parent and child classes of the current class. We have evaluated the proposed approach on eight protein function and ten gene function hierarchical classification datasets. The proposed approach achieved somewhat better predictive accuracies than a global hierarchical Naive Bayes classifier.

[1]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[2]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[3]  Christopher DeCoro,et al.  Hierarchical Shape Classification Using Bayesian Aggregation , 2006, IEEE International Conference on Shape Modeling and Applications 2006 (SMI'06).

[4]  Patricia C Babbitt,et al.  Can sequence determine function? , 2000, Genome Biology.

[5]  Michael R. Thon,et al.  Automatic Annotation of Protein Functional Class from Sparse and Imbalanced Data Sets , 2006, VDMB.

[6]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[7]  Amanda Clare,et al.  Predicting gene function in Saccharomyces cerevisiae , 2003, ECCB.

[8]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[9]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[10]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[11]  Roland Eils,et al.  Applying Support Vector Machines for Gene ontology based gene function prediction , 2004, BMC Bioinformatics.

[12]  Alex Alves Freitas,et al.  Improving the performance of hierarchical classification with swarm intelligence , 2008, EVOBIO 2008.

[13]  Stan Matwin,et al.  Functional Annotation of Genes Using Hierarchical Text Categorization , 2005 .

[14]  Jean-Daniel Zucker,et al.  Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.

[15]  Paul Walsh,et al.  An overview of in silico protein function prediction , 2010, Archives of Microbiology.

[16]  Vasant Honavar,et al.  Learning Classifiers Using Hierarchically Structured Class Taxonomies , 2005, SARA.

[17]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Alex Alves Freitas,et al.  A Global-Model Naive Bayes Approach to the Hierarchical Prediction of Protein Functions , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[19]  Umar Syed,et al.  Using a mixture of probabilistic decision trees for direct prediction of protein function , 2003, RECOMB '03.

[20]  M. Babu,et al.  Molecular signatures of G-protein-coupled receptors , 2013, Nature.

[21]  Junping Sun,et al.  Data Mining and Bioinformatics, First International Workshop, VDMB 2006, Seoul, Korea, September 11, 2006, Revised Selected Papers , 2006, VDMB.

[22]  Carlos Nascimento Silla Novel approaches for hierarchical classification with case studies in protein function prediction , 2011 .

[23]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[24]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[25]  Carlos Eduardo Ferreira,et al.  Advances in Bioinformatics and Computational Biology, 5th Brazilian Symposium on Bioinformatics, BSB 2010, Rio de Janeiro, Brazil, August 31-September 3, 2010. Proceedings , 2010, BSB.

[26]  Alex Alves Freitas,et al.  Comparing Several Approaches for Hierarchical Classification of Proteins with Decision Trees , 2007, BSB.