Mining the Amino Acid Dominance in Gene Sequences

In the recent period, the classification techniques are widely applied in the field of Bioinformatics. The proposed Amino Acid Component based Classification algorithm adopts Iterative Dichotomiser3 classifier. The algorithm consists of two phases viz. attribute selection and component based classification. In the attribute selection phase the dominating amino acids and deficiencies in amino acids that cause the diseases are found. The second phase finds the components of amino acids which spread the diseases in the specified sequence. The experiments were carried out on the gene sequence of dengue virus which is available on the NCBI online biological database and the accuracy of the proposed algorithm is calculated as 90.744%. The proposed classification algorithm is compared with the traditional benchmark algorithms such as Naive Bayes, ID3, Random Forest, Multilayer Perceptron and J48. The result of this work can be used by the drug designers to predict new viral diseases.

[1]  Rupali Bhardwaj,et al.  Implementation of ID3 Algorithm , 2013 .

[2]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[3]  Azuraliza Abu Bakar,et al.  A Comparative Study for Various Methods of Classification , 2012 .

[4]  Jacek M. Zurada,et al.  Solving Selected Classification Problems in Bioinformatics Using Multilayer Neural Network Based on Multi-Valued Neurons (MLMVN) , 2007, ICANN.

[5]  Nidhi Chopra,et al.  ENROLMENT DATA OF DISABLED STUDENTS OF IGNOU: A CASE STUDY USING ID3 , 2013 .

[6]  S. Prakasam,et al.  Effectiveness of Data Mining - based Cancer Prediction System (DMBCPS) , 2013 .

[7]  Lukasz Kurgan,et al.  Amino Acid Sequence Based Method for Prediction of Cell Membrane Protein Types , 2008 .

[8]  S. Archana,et al.  Survey of Classification Techniques in Data Mining , 2014 .

[9]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[10]  Reza Boostani,et al.  GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION , 2008 .

[11]  Mohd Fauzi Othman,et al.  Comparison of different classification techniques using WEKA for breast cancer , 2007 .

[12]  Mihai Horia Zaharia,et al.  Performance Analysis of Algorithms for Protein Structure Classification , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[13]  Ishak Hashim,et al.  Hybrid Learning Algorithm in Neural Network System for Enzyme Classification , 2010, SOCO 2010.

[14]  C. Lampros,et al.  Protein Classification using Sequential Pattern Mining , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  M. Bashyam,et al.  The human genome sequence: impact on health care. , 2003, The Indian journal of medical research.