Machine learning models of antibody-excipient preferential interactions for use in computational formulation design.

Preferential interactions of formulation excipients govern their impact on the stability properties of proteins in solution. The ability to predict these interactions without the need to perform experiments would enable formulation design to begin early in the development of a new antibody therapeutic. With that in mind, we developed a feature set to numerically describe local regions of an antibody's surface for use in machine learning applications. Then, we used these features to train machine learning models for local antibody-excipient preferential interactions for the excipients sorbitol, sucrose, trehalose, proline, arginine.HCl, and NaCl. Our models had accuracies of up to about 85\%. We also used linear (elastic net) models to quantify the contribution of antibody surface features to the preferential interaction coefficients, finding that the carbohydrates and proline tend to have similar important features, while the interactions of arginine.HCl and NaCl are governed by charge features. We present several case studies demonstrating how these machine learning models could be used to predict experimental aggregation and viscosity behavior in solution. Finally, we propose an approach to computational formulation design wherein a panel of excipients may be considered while designing an antibody sequence.