Identification of Protein Coding Region in DNA Sequence Using Novel Adaptive Exon Predictor

Accurately identifying the exon regions in a deoxyribonucleic acid (DNA) sequence is an important task in bio-informatics. Analysis of regions which code for proteins is a key phenomenon for disease identification and design drugs. Exons are DNA segments contain coding information of proteins. Particularly exons within the genes exhibit three base periodicity (TBP), which forms the basis of all exon identification techniques. Several exon identification techniques have been applied successfully for prediction of exons, but improvement is still needed in this direction. By applying signal processing methods, TBP can be easily determined. Adaptive signal processing techniques found to be promising because of their distinct ability of changing weight coefficients in comparison with several other methods. In this paper, a novel adaptive exon predictor (AEP) is proposed based on these considerations using normalization to increase the tracking ability of the adaptive algorithm for exons. Several AEPs are developed using LMS algorithm and its maximum normalized sign based variants to reduce the computational complexity. Hybrid variants of proposed AEPs include MDNLMS, MDNSRLMS, MDNSLMS and MDNSSLMS algorithms. It was shown that MDNSRLMS is more accurate in exon prediction based on performance measures with Sensitiviy (Sn) 0.7372, Specificity (Sp) 0.7573 and Precision (Pr) 0.7122 at a threshold of 0.8 for genomic sequence with Accession AF009962. Finally the exon tracking ability of various AEPs has been assessed through a simulation study and results obtained are compared with existing method using various standard genomic datasets taken from the National Center for Biotechnology Information (NCBI) genomic sequence database.

[1]  M InbamalarT,et al.  Study of DNA Sequence Analysis Using DSP Techniques , 2013 .

[2]  M. Z. U. Rahman,et al.  Efficient and Simplified Adaptive Noise Cancelers for ECG Sensor Based Remote Health Monitoring , 2012, IEEE Sensors Journal.

[3]  Mahin Ghorbani,et al.  Bioinformatics Approaches for Gene Finding , 2015 .

[4]  Srabanti Maji and Deepak Garg Progress in Gene Prediction: Principles and Challenges , 2013 .

[5]  Hamed Heravi,et al.  A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve , 2013 .

[6]  N. Rao,et al.  Predicting bacterial essential genes using only sequence composition information. , 2014, Genetics and molecular research : GMR.

[7]  M. Z. U. Rahman,et al.  NEW ADAPTIVE EXON PREDICTORS FOR IDENTIFYING PROTEIN CODING REGIONS IN DNA SEQUENCE , 2016 .

[8]  Shuichi Onami,et al.  Automatic cell identification in the unique system of invariant embryogenesis in Caenorhabditis elegans , 2014, Biomedical Engineering Letters.

[9]  Guangchen Liu,et al.  Identification of Protein Coding Regions in the Eukaryotic DNA Sequences Based on Marple Algorithm and Wavelet Packets Transform , 2014 .

[10]  Rajiv Saxena,et al.  An Adaptive Window Length Strategy for Eukaryotic CDS Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Paulo S. R. Diniz,et al.  Adaptive Filtering: Algorithms and Practical Implementation , 1997 .

[12]  Jianxin Wang,et al.  Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks , 2014, Science China Life Sciences.

[13]  Haixu Tang,et al.  Gene finding in metatranscriptomic sequences , 2014, BMC Bioinformatics.