Taste Space Versus the World: an Embedding Analysis of Listening Habits and Geography

Probabilistic embedding methods provide a principled way of deriving new spatial representations of discrete objects from human interaction data. The resulting assignment of objects to positions in a continuous, low-dimensional space not only provides a compact and accurate predictive model, but also a compact and flexible representation for understanding the data. In this paper, we demonstrate how probabilistic embedding methods reveal the “taste space” in the recently released Million Musical Tweets Dataset (MMTD), and how it transcends geographic space. In particular, by embedding cities around the world along with preferred artists, we are able to distill information about cultural and geographical differences in listening patterns into spatial representations. These representations yield a similarity metric among city pairs, artist pairs, and cityartist pairs, which can then be used to draw conclusions about the similarities and contrasts between taste space and geographic location.