Analyzing functional similarity of protein sequences with discrete wavelet transform

This paper applies discrete wavelet transform (DWT) with various protein substitution models to find functional similarity of proteins with low identity. A new metric, 'S' function, based on the DWT is proposed to measure the pair-wise similarity. We also develop a segmentation technique, combined with DWT, to handle long protein sequences. The results are compared with those using the pair-wise alignment and PSI-BLAST.

[1]  Satoru Kuhara,et al.  The hydrophobic cores of proteins predicted by wavelet analysis , 1999, Bioinform..

[2]  Qiang Fang,et al.  Protein sequence comparison based on the wavelet transform approach. , 2002, Protein engineering.

[3]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[4]  Li Meng Compressibility Evaluation of IR Spectra Wavelet Compression , 2002 .

[5]  Sean R. Eddy,et al.  Multiple Alignment Using Hidden Markov Models , 1995, ISMB.

[6]  Hilla Peretz,et al.  The , 1966 .

[7]  William R Taylor,et al.  Amino acid encoding schemes from protein structure alignments: multi-dimensional vectors to describe residue types. , 2002, Journal of theoretical biology.

[8]  Junbin Gao,et al.  A review on applications of wavelet transform techniques in chemical analysis: 1989–1997 , 1998 .

[9]  Michael Frazier Wavelets on ℤ , 2000 .

[10]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[11]  G Peltier,et al.  Molecular characterization of CDSP 34, a chloroplastic protein induced by water deficit in Solanum tuberosum L. plants, and regulation of CDSP 34 expression by ABA and high illumination. , 1998, The Plant journal : for cell and molecular biology.

[12]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[13]  Arun Krishnan,et al.  Predicting allergenic proteins using wavelet transform , 2004, Bioinform..

[14]  Rodrigo Lopez,et al.  Public web-based services from the European Bioinformatics Institute , 2004, Nucleic Acids Res..

[15]  Alexander Kai-man Leung,et al.  Wavelet: a new trend in chemistry. , 2003, Accounts of chemical research.

[16]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  Ruth Nussinov,et al.  MASS: multiple structural alignment by secondary structures , 2003, ISMB.

[19]  G Peltier,et al.  Over-expression of a pepper plastid lipid-associated protein in tobacco leads to changes in plastid ultrastructure and plant development upon stress. , 2000, The Plant journal : for cell and molecular biology.

[20]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[21]  Olivier Poch,et al.  BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations , 2001, Nucleic Acids Res..

[22]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[23]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[24]  I Cosic,et al.  The resonant recognition model (RRM) predicts amino acid residues in highly conserved regions of the hormone prolactin (PRL). , 2000, Biophysical chemistry.

[25]  Olivier Poch,et al.  RASCAL: Rapid Scanning and Correction of Multiple Sequence Alignments , 2003, Bioinform..

[26]  William R. Pearson,et al.  Empirical determination of effective gap penalties for sequence comparison , 2002, Bioinform..

[27]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[28]  Denise Gorse,et al.  Wavelet transforms for the characterization and detection of repeating motifs. , 2002, Journal of molecular biology.

[29]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[30]  G N Murshudov,et al.  The structure of the cofactor-binding fragment of the LysR family member, CysB: a familiar fold with a surprising subunit arrangement. , 1997, Structure.

[31]  Dirk Eick,et al.  The last CTD repeat of the mammalian RNA polymerase II large subunit is important for its stability. , 2004, Nucleic acids research.

[32]  P. M. Bentley,et al.  Wavelet transforms: an introduction , 1994 .

[33]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[34]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[35]  Michael F. Shlesinger,et al.  WAVELET TRANSFORMATION OF PROTEIN HYDROPHOBICITY SEQUENCES SUGGESTS THEIR MEMBERSHIPS IN STRUCTURAL FAMILIES , 1997 .

[36]  Pascal Rey,et al.  Immunocytolocalization of CDSP 32 and CDSP 34, two chloroplastic drought-induced stress proteins in Solanum tuberosum plants , 1999 .

[37]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[38]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[40]  Arun Krishnan,et al.  Rapid detection of conserved regions in protein sequences using wavelets , 2004, Silico Biol..

[41]  Xiaoyong Zou,et al.  Prediction of protein secondary structure based on continuous wavelet transform. , 2003, Talanta.

[42]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[43]  MengLongLI,et al.  Application of Embedded Zerotree Wavelet to the Compression of Infrared Spectra , 2003 .

[44]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[45]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[46]  Pietro Liò,et al.  Wavelets in bioinformatics and computational biology: state of art and perspectives , 2003, Bioinform..

[47]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[48]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[49]  Ari Löytynoja,et al.  A hidden Markov model for progressive multiple alignment , 2003, Bioinform..

[50]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[51]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[52]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[53]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Pietro Liò,et al.  Wavelet change-point prediction of transmembrane proteins , 2000, Bioinform..

[55]  I. Daubechies,et al.  Biorthogonal bases of compactly supported wavelets , 1992 .