Abstract The solar wind represents the background into which other space weather phenomena are embedded. Interplanetary coronal mass ejections (ICMEs) travel through and interact with the background solar wind. The nature of this interaction depends on plasma structures, the solar wind, and the typically faster ICME. It is well known that the solar wind that is continuously emitted from the Sun comes in at least two varieties. These are historically called slow solar wind and fast solar wind, although their most distinctive property is not the solar wind speed, but the elemental and charge state compositions. So far, solar wind categorization has been approached with different combinations of hand-crafted classifiers based on expert knowledge and heuristic methods. As a result, the actual number of different solar wind types differs between typical approaches and, in particular, the decision boundaries between solar wind regimes differ considerably. In this situation, a purely data-driven approach that aims at dividing solar wind plasma into categories based on their similarity to other observations can be insightful. Such an approach is in machine learning, realized by unsupervised clustering methods such as k-means clustering. Given a sufficiently large data set, k-means clustering can improve and validate the available solar wind categorization schemes. Here, we apply k-means clustering to solar wind data measured by instruments on the Advanced Composition Explorer, compare the resulting solar wind types to existing heuristic solar wind categorization schemes, determine the probable number of distinct solar wind types that are supported by the observations, discuss the physical interpretation of the resulting solar wind types, and finally utilize the k-means approach to investigate feature selection for solar wind categorization.