Topic analysis in news via sparse learning: a case study on the 2016 US presidential elections

Abstract Textual data such as tweets and news is abundant on the web. However, extracting useful information from such a deluge of data is hardly possible for a human. In this paper, we discuss automated text analysis methods based on sparse optimization. In particular, we use sparse PCA and Elastic Net regression for extracting intelligible topics from a big textual corpus and for obtaining time-based signals quantifying the strength of each topic in time. These signals can then be used as regressors for modeling or predicting other related numerical indices. We applied this setup to the analysis of the topics that arose during the 2016 US presidential elections, and we used the topic strength signals in order to model their influence on the election polls.