New Techniques for the Location of Hot Spots in Proteins and Exons in DNA Using Digital Filters

The development, implementation, and performance evaluation of new techniques for the location of hot spots in proteins and exons in DNA using digital filters are presented. The application of bandpass notch (BPN) digital filters for locating hot spots in proteins is first investigated. A technique is proposed for designing the appropriate BPN filter for a specific protein sequence in which the area under the amplitude response is minimized to achieve maximum selectivity for a chosen stability margin. The minimization is performed using the golden-section search. A tuning technique is also proposed for improving the accuracy of the BPN filter. The tuning is carried out using a least-squares polynomial model. Several example protein sequences are used to illustrate these techniques. BPN filters are then employed for locating exons in DNA. An additional step of lowpass filtering is introduced in order to detect the strength of the bandpass filtered signal as a function of nucleotide location. For the character-to-numerical mapping, the application of the electron-ion interaction potentials (EIIPs) of the nucleotides as well as their binary sequences is investigated.

[1]  Andreas Antoniou,et al.  Practical Optimization: Algorithms and Engineering Applications , 2007, Texts in Computer Science.

[2]  A. Antoniou Digital Signal Processing: Signals, Systems, and Filters , 2005 .

[3]  Michelle R. Arkin,et al.  Small-molecule inhibitors of protein–protein interactions: progressing towards the dream , 2004, Nature Reviews Drug Discovery.

[4]  Irena Cosic,et al.  Application of the resonant recognition model to analysis of interaction between viral and tumor suppressor proteins , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[5]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[6]  Derek Wood Principles of Gene Manipulation. An Introduction to Genetic Engineering , 1981 .

[7]  Lincoln Stein,et al.  Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.

[8]  C. Tanford,et al.  Nature's Robots: A History of Proteins , 2001 .

[9]  I. Cosic,et al.  Investigation of the applicability of dielectric relaxation properties of amino acid solutions within the resonant recognition model , 2003, IEEE Transactions on NanoBioscience.

[10]  Andreas Antoniou,et al.  Identification of Hot-Spot Locations in Proteins Using Digital Filters , 2008, IEEE Journal of Selected Topics in Signal Processing.

[11]  Andreas Antoniou,et al.  Identification of tubulin drug binding sites and prediction of relative differences in binding affinities to tubulin isotypes using digital signal processing. , 2008, Journal of molecular graphics & modelling.

[12]  Bruce Alberts,et al.  Essential Cell Biology , 1983 .

[13]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[14]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[15]  P. P. Va,et al.  Digital filters for gene prediction applications , 2002 .

[16]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[17]  C. Chothia,et al.  The structure of protein-protein recognition sites. , 1990, The Journal of biological chemistry.

[18]  J. Fruton The emergence of biochemistry. , 1976, Science.

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  J. Janin The kinetics of protein‐protein recognition , 1997, Proteins.

[21]  Parameswaran Ramachandran Identification of the locations of hot spots in proteins using digital signal processing , 2005 .

[22]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[23]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[24]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[25]  Qiang Fang,et al.  Investigation of the structural and functional relationships of oncogene proteins , 2002, Proc. IEEE.

[26]  Gregory Radick,et al.  The Century of the Gene , 2001, Heredity.

[27]  Amir Asif,et al.  A fast DFT based gene prediction algorithm for identification of protein coding regions , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[29]  Richard G. Lyons,et al.  Understanding Digital Signal Processing (2nd Edition) , 2004 .

[30]  Hanspeter Herzel,et al.  10-11 bp periodicities in complete genomes reflect protein structure and DNA folding , 1999, Bioinform..

[31]  Veljko Veljković A theoretical approach to the preselection of carcinogens and chemical carcinogenesis , 1980 .

[32]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[33]  N. Barton,et al.  The language of the genes , 1990, Nature.

[34]  D. Baker,et al.  A simple physical model for binding energy hot spots in protein–protein complexes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[37]  C. Burge,et al.  Computational inference of homologous gene structures in the human genome. , 2001, Genome research.

[38]  Steve B Jones,et al.  Darwin's Ghost: The Origin of Species Updated , 2000 .

[39]  J. Wells,et al.  Comparison of a structural and a functional epitope. , 1993, Journal of molecular biology.

[40]  Holger Gohlke,et al.  Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding. , 2006, Current medicinal chemistry.

[41]  S. Vajda,et al.  Anchor residues in protein-protein interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[42]  J. Wells,et al.  Systematic mutational analyses of protein-protein interfaces. , 1991, Methods in enzymology.

[43]  R. Fraser The structure of deoxyribose nucleic acid. , 2004, Journal of structural biology.

[44]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[45]  J. Wells,et al.  High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. , 1989, Science.

[46]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[47]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[48]  Thomas Schiex,et al.  Integrating alternative splicing detection into gene prediction , 2005, BMC Bioinformatics.

[49]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[50]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[51]  Jie Liang,et al.  Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. , 2004, Journal of molecular biology.

[52]  Brian Hayes,et al.  THE INVENTION OF THE GENETIC CODE , 1998 .

[53]  P. Ramachandran,et al.  Localization of Hot Spots in Proteins Using Digital Filters , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[54]  J. Lazovic,et al.  Selection of amino acid parameters for Fourier transform-based analysis of proteins , 1996, Comput. Appl. Biosci..

[55]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[56]  J. Oliver,et al.  A relationship between GC content and coding-sequence length , 1996, Journal of Molecular Evolution.

[57]  I. Cosic The resonant recognition model of macromolecular bioactivity : theory and applications , 1997 .

[58]  I. Cosic,et al.  Is it Possible to Analyze DNA and Protein Sequences by the Methods of Digital Signal Processing? , 1985, IEEE Transactions on Biomedical Engineering.

[59]  Peter A. Kollman,et al.  Computational alanine scanning of the 1:1 human growth hormone–receptor complex , 2002, J. Comput. Chem..

[60]  Andreas Antoniou,et al.  Improved hot-spot location technique for proteins using a bandpass notch digital filter , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[61]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[62]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[63]  L. Hood,et al.  The digital code of DNA , 2003, Nature.

[64]  Luhua Lai,et al.  Structure-based method for analyzing protein–protein interfaces , 2004, Journal of molecular modeling.

[65]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[66]  Andreas Antoniou,et al.  Location of exons in DNA sequences using digital filters , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[67]  H. Wolfson,et al.  Protein-Protein Interactions: Coupling of Structurally Conserved Residues and of Hot Spots across Interfaces. Implications for Docking , 2004 .

[68]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[69]  V. Veljković,et al.  Simple General-Model Pseudopotential , 1972 .

[70]  Antoine Danchin,et al.  Genomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns" , 2005, BMC Bioinformatics.

[71]  Andreas Antoniou,et al.  Optimized numerical mapping scheme for filter-based exon location in DNA using a quasi-Newton algorithm , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[72]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[73]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[74]  M. Kemp The Mona Lisa of modern science , 2003, Nature.

[75]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[76]  H. Wolfson,et al.  Studies of protein‐protein interfaces: A statistical analysis of the hydrophobic effect , 1997, Protein science : a publication of the Protein Society.

[77]  M. Kanehisa,et al.  Distribution profiles of GC content around the translation initiation site in different species , 1994, FEBS letters.

[78]  Michael Allaby,et al.  A Dictionary of Ecology , 2006 .

[79]  Andreas Antoniou,et al.  Tuning technique for the location of hot spots in proteins using a bandpass notch digital filter , 2009, 2009 IEEE International Workshop on Genomic Signal Processing and Statistics.

[80]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[81]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[82]  P. Jhurani,et al.  Receptor and antibody epitopes in human growth hormone identified by homolog-scanning mutagenesis. , 1989, Science.

[83]  Sandy B. Primrose,et al.  Principles of gene manipulation: An introduction to genetic engineering , 1980 .

[84]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[85]  Gunnar Rätsch,et al.  Improving the Caenorhabditis elegans Genome Annotation Using Machine Learning , 2006, PLoS Comput. Biol..

[86]  A. Antoniou,et al.  Identification and location of hot spots in proteins using the short-time discrete Fourier transform , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[87]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[88]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..