A Novel Optimized Approach for Gene Identification in DNA Sequences

Gene identification is an open optimization problem in Bioinformatics. Exponential growth of biological data needs efficient methods for protein translation. Several approaches have been proposed that rely on indicator sequences, statistical and DSP techniques but yet an optimized procedure is required to add an optimal solution. A novel approach for gene identification has been proposed in this paper by employing discrete wavelet transforms for noise reduction in DNA sequences and a novel indicator sequence has been introduced for better signal mapping. Wavelet transforms greatly reduced the background noise and visible peaks of genic regions were found in power spectral estimation. The comparative analysis of proposed and existing approaches showed significant results for novel approach over prevailing solutions for datasets Yersinia pestis (ACCESSION: NC_004088, 4000 bp) and gene F56F11.5 of C elegans (Accession number AF099922) from location 7021. The same significance was observed with four other experiments with real datasets taken from NCBI.

[1]  M. Omair Ahmad,et al.  Prediction of protein-coding regions in DNA sequences using a model-based approach , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[2]  Kuldip Singh,et al.  A Time Series Approach for Identification of Exons and Introns , 2007 .

[3]  Amir Asif,et al.  A fast DFT based gene prediction algorithm for identification of protein coding regions , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Zhu Yi-sheng,et al.  Prediction of Protein Coding Regions by Support Vector Machine , 2009, 2009 International Symposium on Intelligent Ubiquitous Computing and Education.

[5]  Mahmood Akhtar,et al.  Optimizing period-3 methods for eukaryotic gene prediction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  V. K. Srivastava,et al.  DSP technique for gene and exon prediction taking complex indicator sequence , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[7]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Feng Liu,et al.  Predicting protein secondary structure using continuous wavelet transform and Chou-Fasman method , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[9]  M. Roy,et al.  Identification and analysis of coding and non-coding regions of a DNA sequence by positional frequency distribution of nucleotides (PFDN) algorithm , 2009, 2009 4th International Conference on Computers and Devices for Communication (CODEC).

[10]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[11]  Hazrina Yusof Hamdani,et al.  Gene prediction system , 2008 .

[12]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[13]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[14]  D.G. Grandhi,et al.  2-Simplex mapping for identifying the protein coding regions in DNA , 2007, TENCON 2007 - 2007 IEEE Region 10 Conference.

[15]  Shuo Guo,et al.  An integrative algorithm for predicting protein coding regions , 2008, APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.

[16]  Mi̇ne DOSAY-AKBULUT,et al.  Group I Introns and Splicing Mechanism and Their Present Possibilities in Elasmobranchs , 2006 .

[17]  .. A.Parent,et al.  Transcription and mRNA Processing Events: the Importance of Coordination , 2004 .