Exploiting Co-evolution across Protein Families for Predicting Native Contacts and Protein-Protein Interaction Surfaces

Correlated substitution patterns between residues of a protein family have been exploited to reveal information on the structures of proteins. However, such studies require a large number (e.g., the order of one thousand) of homologous yet variable protein sequences. So far, most studies have been limited to a few exemplary proteins for which a large number of such sequences happen to be available. Rapid advances in genome sequencing will soon be able to generate this many sequences for the majority of common bacterial proteins. Sequencing a large number of simple eukaryote such as yeast can in principle generate similar number of common eukaryotic protein sequences, beyond a collection of highly amplified protein domains which already reach the necessary numbers.