Properties, Prediction, and Prevalence of Useful User-Generated Comments for Descriptive Annotation of Social Media Objects

User-generated comments in online social media have recently been gaining increasing attention as a viable source of general-purpose descriptive annotations for digital objects like photos or videos. Because users have different levels of expertise, however, the quality of their comments can vary from very useful to entirely useless. Our aim is to provide automated support for the curation of useful user-generated comments from public collections of digital objects. After constructing a crowd-sourced gold standard of USEFUL and NOT USEFUL comments, we use standard machine learning methods to develop a “usefulness” classifier, exploring the impact of surface-level, syntactic, semantic, and topic-based features in addition to extra-linguistic attributes of the author and his or her social media activity. We then adapt an existing model of prevalence detection that uses the learned classifier to investigate patterns in the commenting culture of two popular social media platforms. We find that the prevalence of USEFUL comments is platform-specific and is further influenced by the entity type of the media object being commented on (person, place, event), its time period (e.g., year of an event), and the degree of polarization among commenters.

[1]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[2]  Wolfgang Nejdl,et al.  How useful are your comments?: analyzing and predicting youtube comments and comment ratings , 2010, WWW '10.

[3]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[4]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[5]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[8]  Kilian Q. Weinberger,et al.  Resolving tag ambiguity , 2008, ACM Multimedia.

[9]  Gilad Mishne,et al.  YR-2007-005 FINDING HIGH-QUALITY CONTENT IN SOCIAL MEDIA WITH AN APPLICATION TO COMMUNITY-BASED QUESTION ANSWERING , 2007 .

[10]  Matthew Rowe,et al.  What Catches Your Attention? An Empirical Study of Attention Patterns in Community Forums , 2012, ICWSM.

[11]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[12]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.

[13]  Mor Naaman,et al.  Finding and assessing social media information sources in the context of journalism , 2012, CHI.

[14]  F. Maxwell Harper,et al.  Facts or friends?: distinguishing informational and conversational questions in social Q&A sites , 2009, CHI.

[15]  Elaheh Momeni,et al.  An empirical analysis of characteristics of useful comments in social media , 2013, WebSci.

[16]  Panagiotis G. Ipeirotis,et al.  Designing novel review ranking systems: predicting the usefulness and impact of reviews , 2007, ICEC.

[17]  James Caverlee,et al.  Ranking Comments on the Social Web , 2009, 2009 International Conference on Computational Science and Engineering.

[18]  Jon M. Kleinberg,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Opinions How Opinions are Received by Online Communities: A Case Study on Amazon.com Helpfulness Votes , 2022 .

[19]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[20]  Yue Lu,et al.  Exploiting social context for review quality prediction , 2010, WWW '10.

[21]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[22]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[23]  Paul Resnick,et al.  Slash(dot) and burn: distributed moderation in a large online conversation space , 2004, CHI.

[24]  Claire Cardie,et al.  Estimating the prevalence of deception in online review communities , 2012, WWW.

[25]  Hila Becker,et al.  Identifying content for planned events across social media sites , 2012, WSDM '12.

[26]  R. Gunning The Technique of Clear Writing. , 1968 .

[27]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.