A Text Classification Framework with a Local Feature Ranking for Learning Social Networks

In this paper, a text classifier framework with a feature ranking scheme is proposed to extract social structures from text data. It is assumed that only a small subset of relations between the individuals in a community is known. With this assumption, the social network extraction is translated into a classification problem. The relations between two individuals are represented by merging their document vectors and the given relations are used as labels of training data. By this transformation, a text classifier such as Rocchio is used for learning the unknown relations. We show that there is a link between the intrinsic sparsity of social networks and class imbalance. Furthermore, we show that feature ranking methods usually fail in problem with unbalanced data. In order to deal with this deficiency and re-balance the unbalanced social data, a local feature ranking method, which is called reverse discrimination, is proposed.