Using machine learning to augment collaborative filtering of community discussions

Collaborative filtering systems have been developed to manage information overload in online communities. In these systems, users rank content provided by other users on the validity or usefulness within their particular context. Slash-dot is an example of such a community where peers rate each others' comments based on their relevance to the post. This work extracts a wide variety of features from the Slashdot metadata and posts' linguistic contents to identify features that can predict the community rating. We find that author reputation, use of pronouns, and author sentiment are salient. We achieve 76% accuracy at predicting the community's rating of the post as good, neutral, or bad.