BACKGROUND
Database-searching methods based on sequence similarity have become the most commonly used tools for characterizing newly sequenced proteins. Due to the often underestimated functional diversity in protein families and superfamilies, however, it is difficult to make the characterization specific and accurate. In this work, we have extended a method for active-site identification from predicted protein structures.
RESULTS
The structural conservation and variation of the active sites of the alpha/beta hydrolases with known structures were studied. The similarities were incorporated into a three-dimensional motif that specifies essential requirements for the enzymatic functions. A threading algorithm was used to align 651 Escherichia coli open reading frames (ORFs) to one of the members of the alpha/beta hydrolase fold family. These ORFs were then screened according to our three-dimensional motif and with an extra requirement that demands conservation of the key active-site residues among the proteins that bear significant sequence similarity to the ORFs. 17 ORFs from E. coli were predicted to have hydrolase activity and their putative active-site residues were identified. Most were in agreement with the experiments and results of other database-searching methods. The study further suggests that YHET_ECOLI, a hypothetical protein classified as a member of the UPF0017 family (an uncharacterized protein family), bears all the hallmarks of the alpha/beta hydrolase family.
CONCLUSIONS
The novel feature of our method is that it uses three-dimensional structural information for function prediction. The results demonstrate the importance and necessity of such a method to fill the gap between sequence alignment and function prediction; furthermore, the method provides a way to verify the structure predictions, which enables an expansion of the applicable scope of the threading algorithms.