Mining Reddit as a New Source for Software Requirements

Mining app stores and social media has proven to be a good source for collecting user feedback to foster requirements engineering and software evolution. Recent literature on mining software-related data from social platforms, such as Twitter and Facebook, shows that it complements app store mining. However, there are many other platforms where users discuss and provide feedback on software applications that are not thoroughly researched and analysed. One of such platforms is reddit. In this paper, we introduce reddit as a new potential data source and explore if and how requirements engineering and software evolution can benefit from obtaining user feedback from reddit. We also present an exploratory study in which we analysed the usage characteristics (i.e., frequency of posts, number of comments, and number of users for each subreddit) of reddit posts about software applications. Furthermore, we examined the content of the posts and the results reveal that almost 54% of posts contain useful information. Finally, we investigated the potential of automatic classification and applied machine learning algorithms to unstructured and noisy reddit data to perform automated classification into the categories of bug reports, feature related, and irrelevant. We found that the Support Vector Machine algorithm with the F1-score of 84% can be effective in categorizing reddit posts. Our results show that reddit posts provide useful feedback on software applications that can foster requirements engineering and software evolution.