Radio interferometers do not measure the sky brightness distribution directly but rather a modified Fourier transform of it. Imaging algorithms, thus, need a computational representation of the linear measurement operator and its adjoint, irrespective of the specific chosen imaging algorithm. In this paper, we present a C++ implementation of the radio interferometric measurement operator for wide-field measurements which is based on "improved $w$-stacking". It can provide high accuracy (down to $\approx 10^{-12}$), is based on a new gridding kernel which allows smaller kernel support for given accuracy, dynamically chooses kernel, kernel support and oversampling factor for maximum performance, uses piece-wise polynomial approximation for cheap evaluations of the gridding kernel, treats the visibilities in cache-friendly order, uses explicit vectorisation if available and comes with a parallelisation scheme which scales well also in the adjoint direction (which is a problem for many previous implementations). The implementation has a small memory footprint in the sense that temporary internal data structures are much smaller than the respective input and output data, allowing in-memory processing of data sets which needed to be read from disk or distributed across several compute nodes before.