ChemAlign: Biologically Relevant Multiple Sequence Alignment Using Physicochemical Properties

We present a new algorithm, ChemAlign, that uses physicochemical properties and secondary structure elements to create biologically relevant multiple sequence alignments (MSAs). Additionally, we introduce the Physicochemical Property Difference (PPD) score for the evaluation of MSAs. This score is the normalized difference of physicochemical property values between a calculated and a reference alignment. It takes a step beyond sequence similarity and measures characteristics of the amino acids to provide a more biologically relevant metric. ChemAlign is able to produce more biologically correct alignments and can help to identify potential drug docking sites.

[1]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  Peer Bork,et al.  SMART 4.0: towards genomic data integration , 2004, Nucleic Acids Res..

[4]  Tamer Kahveci,et al.  A New Approach for Alignment of Multiple Proteins , 2006, Pacific Symposium on Biocomputing.

[5]  Hyrum Carroll,et al.  An open source phylogenetic search and alignment package , 2009, Int. J. Bioinform. Res. Appl..

[6]  Pedro Gonnet,et al.  Probabilistic alignment of motifs with sequences , 2002, Bioinform..

[7]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[10]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[11]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[12]  Gajendra P. S. Raghava,et al.  OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy , 2003, BMC Bioinformatics.

[13]  Jaap Heringa,et al.  Two Strategies for Sequence Comparison: Profile-preprocessed and Secondary Structure-induced Multiple Alignment , 1999, Comput. Chem..

[14]  K. V. Venkatesh,et al.  Detailed protein sequence alignment based on Spectral Similarity Score (SSS) , 2005, BMC Bioinformatics.

[15]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[16]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[17]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[18]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[19]  M J Sternberg,et al.  An approach to improving multiple alignments of protein sequences using predicted secondary structure. , 2001, Protein engineering.

[20]  Hyrum Carroll,et al.  DNA reference alignment benchmarks based on tertiary structure of encoded proteins , 2007, Bioinform..