Prediction of protein hydration sites from sequence by modular neural networks.

The hydration properties of a protein are important determinants of its structure and function. Here, modular neural networks are employed to predict ordered hydration sites using protein sequence information. First, secondary structure and solvent accessibility are predicted from sequence with two separate neural networks. These predictions are used as input together with protein sequences for networks predicting hydration of residues, backbone atoms and sidechains. These networks are trained with protein crystal structures. The prediction of hydration is improved by adding information on secondary structure and solvent accessibility and, using actual values of these properties, residue hydration can be predicted to 77% accuracy with a Matthews coefficient of 0.43. However, predicted property data with an accuracy of 60-70% result in less than half the improvement in predictive performance observed using the actual values. The inclusion of property information allows a smaller sequence window to be used in the networks to predict hydration. It has a greater impact on the accuracy of hydration site prediction for backbone atoms than for sidechains and for non-polar than polar residues. The networks provide insight into the mutual interdependencies between the location of ordered water sites and the structural and chemical characteristics of the protein residues.