Comment on "Fast and accurate modeling of molecular atomization energies with machine learning".

In a recent Letter [1], the authors construct a machine learning (ML) model of molecular atomization energies, which they compare to bond counting (BC) and the PM6 semiempirical method [2]. However, their ML model was trained and tested on density functional theory (DFT) energies while BC and PM6 are fit to standard enthalpies. For fair comparison, bond energies are refit to DFT data and PM6 is converted to an electronic energy using peratom corrections [3]. BC and PM6 both perform better than the ML model and are free of large outliers in their error distributions as shown in Fig. 1. As noted in [25] of [1], some ML model error may originate from the coordinate system choice. The n eigenvalues of the Coulomb matrix correspond to an equienergy 2n-dimensional space of n-atom molecules rather than one molecule. For n 1⁄4 3, this corresponds to the 3 translations and 3 rotations that naturally preserve the energy of an isolated molecule. For n > 3, the space includes unphysical molecular deformations that destroy structural rigidity. Figure 2 shows this with a distortion of acetylene (C2H2) that preserves its ML energy and coordinate, (53.058, 21.149, 0.290, 0.219). It is suggested in [25] of [1] that the n sorted entries of a Coulomb matrix might be utilized instead of its n eigenvalues as a ML coordinate system. This eliminates the dimensional deficiency, but produces identical coordinates for homometric molecules [5] that do not necessarily have equal energies. A computationally expensive alternative is the equivalence class of permuted Coulomb matrices with distance metric