Identifying Shifts in Collective Attention to Topics on Social Media

The complex, ever-shifting landscape of social media can obscure important changes in conversations involving smaller groups. Discovering these subtle shifts in attention to topics can be challenging for algorithms attuned to global topic popularity. We present a novel unsupervised method to identify shifts in high-dimensional textual data. By utilizing a random selection of date-time instances as inflection points in discourse, the method automatically labels the data as before or after a change point and trains a classifier to predict these labels. Next, it fits a mathematical model of classification accuracy to all trial change points to infer the true change points, as well as the fraction of data affected (a proxy for detection confidence). Finally, it splits the data at the detected change and repeats recursively until a stopping criterion is reached. The method beats state-of-the-art change detection algorithms in accuracy, and often has lower time and space complexity. The method identifies meaningful changes in real-world settings, including Twitter conversations about the Covid-19 pandemic and stories posted on Reddit. The method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in high dimensional data.