[The active site of human glucocerebrosidase: structural predictions and experimental validations].

Gaucher disease is a lysosomal storage disorder caused by a deficiency in glucocerebrosidase which cleaves the beta-glucosidic linkage of glucosylceramide, a normal intermediate in glycolipid metabolism. Glucocerebrosidase belongs to the clan GH-A of glycoside hydrolases, a large group of enzymes which function with retention of the anomeric configuration at the hydrolysis site. Accurate three-dimensional (3D) structure data for glucocerebrosidase should help to better understand the molecular bases of Gaucher disease. As such 3D structure data were not available, we used the two-dimensional hydrophobic cluster analysis (HCA) method to make structure predictions for the catalytic domains of clan GH-A glycoside hydrolases. We found that all the enzymes of clan GH-A may share a similar catalytic domain consisting of an (alpha/beta)8 barrel with the critical acid/base and nucleophile residues located at the C-terminal ends of strands beta 4 and beta 7, respectively. In the case of glucocerebrosidase, Glu 235 was predicted to be the putative acid/base catalyst whereas the nucleophile was located at Glu 340. Next, in order to obtain experimental evidence supporting these HCA-based predictions, we used retroviral vectors to express, in murine null cells, E235A and E340A mutant proteins, in which alanine residues unable to participate in the enzymatic reaction replace the presumed critical glutamic acid residues. Both mutants were found to be catalytically inactive although they were correctly folded/processed and sorted to the lysosome. Thus, Glu 235 and Glu 340 do indeed play key roles in the active site of human glucocerebrosidase as predicted by the HCA analysis. In a broader perspective, our work points out that bioinformatics approaches may be highly useful for generating structure-function predictions based on sequence-structure interrelationships, especially in the context of a rapid increase in protein sequence information through genome sequencing.