Physiological concentrations of metabolites can partly be explained by their molecular structure. We hypothesize that substances containing certain chemical groups show increased or decreased concentration in cells. We consider here, as chemical groups, local atomic configurations, describing an atom, its bonds, and its direct neighbor atoms. To test our hypothesis, we fitted a linear statistical model that relates experimentally determined logarithmic concentrations to feature vectors containing count numbers of the chemical groups. In order to determine chemical groups that have a clear effect on the concentration, we use a regularized (lasso) regression. In a dataset on 41 substances of central metabolism in different organisms, we found that the physical concentrations are increased by the occurrence of amino and hydroxyl groups, while aldehydes, ketones, and phosphates show decreased concentrations. The model explains about 22% of the variance of the logarithmic mean concentrations.
[1]
Susumu Goto,et al.
The KEGG databases at GenomeNet
,
2002,
Nucleic Acids Res..
[2]
Trevor Hastie,et al.
The Elements of Statistical Learning
,
2001
.
[3]
D. Kell,et al.
Metabolomics by numbers: acquiring and understanding global metabolite data.
,
2004,
Trends in biotechnology.
[4]
B. Wright,et al.
Cellular concentrations of enzymes and their substrates.
,
1990,
Journal of theoretical biology.
[5]
Henrik Madsen,et al.
Calibration with absolute shrinkage
,
2001
.
[6]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[7]
Pickett,et al.
Computational methods for the prediction of 'drug-likeness'
,
2000,
Drug discovery today.