Answer Quality Prediction in Q/A Social Networks by Leveraging Temporal Features

Community Question Answering (or CQA) services (also known as Q/A social networks) have become widespread in the last several years. It is seen as a potential alternative to search as using Q/A services avoids sifting through a large number of (ranked) search results, returned by a typical search engine, to get at the desired information. Currently, \emph{best} answers in CQA services are determined either manually or through a voting process. Many CQA services calculate activity levels for users to approximate the notion of expertise. As large numbers of CQA services are becoming available, it is important and challenging to predict \emph{best} answers (not necessarily answers by an expert) using machine learning techniques. Previous approaches, typically, extract a set of features (primarily textual and non-textual) from the data set and use them in a classification system to determine the \emph{best} answer. This paper posits that temporal features, different from the ones proposed and used in the literature, are better-suited for Q/A data sets and can be quite effective for predicting the quality of answers. The suitability of temporal features is based on the observation that these services are dynamic in nature in terms of the number of users participating in a given period and how many questions they choose to answer over an interval. We propose and analyze a small set of temporal features, and demonstrate that a few of these features work better than the large number of features used in the literature using the same traditional classification techniques. We also argue that the classification approaches measuring precision and recall are not well-suited as the CQA data is unbalanced, and quality ranking of \emph{all} answers need to be predicted. We propose the use of learning to rank approaches, and show that the features identified in this paper work very well with this approach as well. We use multiple, diverse data sets to establish the utility and effectiveness of features identified for predicting the quality of answers. This approach allows us to qualitatively predict the best answer as well as rank \emph{all} answers. The long-term goal is to build a framework for identifying experts, at different levels of granularity such as global and concept-specific, for CQA services.