Predicting Physiological Concentrations of Metabolites from Their Molecular Structure

Physiological concentrations of metabolites can partly be explained by their molecular structure. We hypothesize that substances containing certain chemical groups show increased or decreased concentration in cells. We consider here, as chemical groups, local atomic configurations, describing an atom, its bonds, and its direct neighbor atoms. To test our hypothesis, we fitted a linear statistical model that relates experimentally determined logarithmic concentrations to feature vectors containing count numbers of the chemical groups. In order to determine chemical groups that have a clear effect on the concentration, we use a regularized (lasso) regression. In a dataset on 41 substances of central metabolism in different organisms, we found that the physical concentrations are increased by the occurrence of amino and hydroxyl groups, while aldehydes, ketones, and phosphates show decreased concentrations. The model explains about 22% of the variance of the logarithmic mean concentrations.