Clustering of fungal hexosaminidase enzymes based on free alignment method using MLP neural network

Studies of biological evolution have generally focused on nucleotide or amino acid sequences of certain genes related to specific enzymes. Most phylogenetic tree constructions have been carried out using amino acid sequences and are used as a predictor to show evolutionary relationships. Phylogenetic analysis is usually performed based on multiple sequence alignment of a gene from different organisms including fungi. A number of programs have been introduced for gene clustering and phylogenetic analysis. For example, the most popular web-based program is Clustal Omega which is commonly used by biologists. When the number of uploaded sequences increases, this program not only works slowly but also the final constructed cladogram is confusing and incorrect from evolutionary point of view. In the present study, we used fungal hexosaminidases which are extracellular enzymes with a lot of applications in biotechnology but extremely varied and confusing in evolutionary terms. A standard taxonomy-based phylogenetic tree was constructed for 835 FH amino acid sequences retrieved from National Center for Biotechnology Information (NCBI) on March 16, 2015. Then a supervised multilayer perceptron (MLP) neural network was used to discriminate FH sequences. Based on relative frequency of amino acid in FH sequences, 41 neural networks were designed for seven levels from the phylum to family. Minimum accuracy of the neural network was equal to 99% at all seven discrimination levels. As a final step, an additional evaluation was performed on the designed model with 143 new released FH sequences extracted on July 1, 2015. The clustering results have shown a proper match with fungal taxonomy to show evolutionary relationships.

[1]  Abbas Rohani,et al.  Prediction of tractor repair and maintenance costs using Artificial Neural Network , 2011, Expert Syst. Appl..

[2]  Rüdiger Ettrich,et al.  Computational study of β-N-acetylhexosaminidase from Talaromyces flavus, a glycosidase with high substrate flexibility , 2015, BMC Bioinformatics.

[3]  Narayanaswamy Srinivasan,et al.  CLAP: A web-server for automatic classification of proteins with special reference to multi-domain proteins , 2014, BMC Bioinformatics.

[4]  P. Frey,et al.  Correlations of the basicity of His 57 with transition state analogue binding, substrate reactivity, and the strength of the low-barrier hydrogen bond in chymotrypsin. , 1998, Biochemistry.

[5]  Amir Hossein KayvanJoo,et al.  Unravelling evolution of Nanog, the key transcription factor involved in self-renewal of undifferentiated embryonic stem cells, by pattern recognition in nucleotide and tandem repeats characteristics. , 2016, Gene.

[6]  Y.Z. Chen,et al.  Enzyme family classification by support vector machines , 2004, Proteins.

[7]  Rodrigo Lopez,et al.  The EMBL-EBI bioinformatics web and programmatic tools framework , 2015, Nucleic Acids Res..

[8]  M. Ebrahimi,et al.  Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology. , 2014, Journal of theoretical biology.

[9]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[10]  Sébastien Lê,et al.  A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data , 2013, BMC Bioinformatics.

[11]  Rick van der Zwan,et al.  Medication Adherence in Patients with Rheumatoid Arthritis: The Effect of Patient Education, Health Literacy, and Musculoskeletal Ultrasound , 2015, BioMed research international.

[12]  J. Musarrat,et al.  Chitinases: An update , 2013, Journal of pharmacy & bioallied sciences.

[13]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[14]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[15]  Gary Benson,et al.  Editorial: Nucleic Acids Research annual Web Server Issue in 2015 , 2015, Nucleic Acids Res..

[16]  K. Sorimachi Phylogenetic tree construction based on amino acid composition and nucleotide content of complete vertebrate mitochondrial genomes , 2013 .

[17]  Shengrui Wang,et al.  CLUSS: Clustering of protein sequences based on a new similarity measure , 2007, BMC Bioinformatics.

[18]  Kristýna Slámová,et al.  β-N-acetylhexosaminidase: what's in a name…? , 2010, Biotechnology advances.

[19]  Wei Zheng,et al.  Novel Numerical Characterization of Protein Sequences Based on Individual Amino Acid and Its Application , 2015, BioMed research international.

[20]  B. Tokhmechi,et al.  Signal processing approaches as novel tools for the clustering of N-acetyl-β-D-glucosaminidases , 2012 .

[21]  Mansour Ebrahimi,et al.  Comparative study of ammonium transporters in different organisms by study of a large number of structural protein features via data mining algorithms , 2011, Genes & Genomics.