Feature Selection for Emotion Classification

In this paper, we describe a novel supervised approach to extract a set of features for document representation in the context of Emotion Classification (EC). Our approach employs the coefficients of a logistic regression model to extract the most discriminative word unigrams and bigrams to perform EC. In particular, we employ this set of features to represent the documents, while we perform the classification using a Support Vector Machine. The proposed method is evaluated on two publicly available and widely-used collections. We also evaluate the robustness of the extracted set of features on different domains, using the first collection to perform feature extraction and the second one to perform EC. We compare the obtained results to similar supervised approaches for document classification (i.e. FastText), EC (i.e. #Emotional Tweets, SNBC and UMM) and to a Word2Vec-based pipeline.