Effective gene prediction by high resolution frequency estimator based on least-norm solution technique

Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method.

[1]  Changchuan Yin,et al.  A Novel Construction of Genome Space with Biological Geometry , 2010, DNA research : an international journal for rapid publication of reports on genes and genomes.

[2]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[3]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[4]  Changchuan Yin,et al.  Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. , 2007, Journal of theoretical biology.

[5]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[6]  N. Rao,et al.  Detection of 3-periodicity for small genomic sequences based on AR technique , 2004, 2004 International Conference on Communications, Circuits and Systems (IEEE Cat. No.04EX914).

[7]  Jon Shlens,et al.  A TUTORIAL ON PRINCIPAL COMPONENT ANALYSIS Derivation , Discussion and Singular Value Decomposition , 2003 .

[8]  James R. Hopgood,et al.  Nonconcurrent multiple speakers tracking based on extended Kalman particle filter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[10]  Phillip A. Regalia,et al.  On the behavior of information theoretic criteria for model order selection , 2001, IEEE Trans. Signal Process..

[11]  Monson H. Hayes,et al.  Statistical Digital Signal Processing and Modeling , 1996 .

[12]  Stephen S.-T. Yau,et al.  DNA sequence comparison by a novel probabilistic method , 2011, Inf. Sci..

[13]  Dimitris Anastassiou DSP in genomics: processing and frequency-domain analysis of character strings , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Andrzej K. Brodzik,et al.  Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Changchuan Yin,et al.  Denoising the 3-Base Periodicity Walks of DNA Sequences in Gene Finding , 2013, ArXiv.

[17]  Changchuan Yin,et al.  A Fourier Characteristic of Coding Sequences: Origins and a Non-Fourier Approximation , 2005, J. Comput. Biol..

[18]  Tadeusz Lobos,et al.  Harmonics and interharmonics estimation using advanced signal processing methods , 2000, Ninth International Conference on Harmonics and Quality of Power. Proceedings (Cat. No.00EX441).

[19]  Elif Derya Übeyli,et al.  Comparison of eigenvector methods with classical and model-based methods in analysis of internal carotid arterial Doppler signals , 2003, Comput. Biol. Medicine.

[20]  Application of Spectral Analysis to DNA Sequences * t , 2006 .

[21]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[22]  Ganapati Panda,et al.  Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach , 2011, Genom. Proteom. Bioinform..

[23]  Xiao Sun,et al.  An Integrative Method for Identifying the Over-Annotated Protein-Coding Genes in Microbial Genomes , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.

[24]  M. Roy,et al.  Identification and analysis of coding and non-coding regions of a DNA sequence by positional frequency distribution of nucleotides (PFDN) algorithm , 2009, 2009 4th International Conference on Computers and Devices for Communication (CODEC).

[25]  William B. Kendall,et al.  A New Algorithm for Computing Correlations , 1974, IEEE Transactions on Computers.

[26]  Application of Spectral Analysis to DNA Sequences , 2006 .

[27]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[28]  Dominique Lavenier,et al.  Coding Region Prediction Based on a Universal DNA Sequence Representation Method , 2008, J. Comput. Biol..

[29]  Hon Keung Kwan,et al.  Novel methodologies for spectral classification of exon and intron sequences , 2012, EURASIP J. Adv. Signal Process..

[30]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[31]  Chenglong Yu,et al.  A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications , 2011, PloS one.

[32]  Petre Stoica,et al.  Spectral Analysis of Signals , 2009 .

[33]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[34]  H E Stanley,et al.  Statistical properties of DNA sequences. , 1995, Physica A.

[35]  Manaswini Pradhan,et al.  An Extensive Survey on Gene Prediction Methodologies , 2010 .

[36]  Pramod Kumar Meher,et al.  Improved Comb Filter based Approach for Effective Prediction of Protein Coding Regions in DNA Sequences , 2011, J. Signal Inf. Process..

[37]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .