Analyzing and Mining Comments and Comment Ratings on the Social Web

An analysis of the social video sharing platform YouTube and the news aggregator Yahoo! News reveals the presence of vast amounts of community feedback through comments for published videos and news stories, as well as through metaratings for these comments. This article presents an in-depth study of commenting and comment rating behavior on a sample of more than 10 million user comments on YouTube and Yahoo! News. In this study, comment ratings are considered first-class citizens. Their dependencies with textual content, thread structure of comments, and associated content (e.g., videos and their metadata) are analyzed to obtain a comprehensive understanding of the community commenting behavior. Furthermore, this article explores the applicability of machine learning and data mining to detect acceptance of comments by the community, comments likely to trigger discussions, controversial and polarizing content, and users exhibiting offensive commenting behavior. Results from this study have potential application in guiding the design of community-oriented online discussion platforms.

[1]  Matthew Rowe,et al.  Anticipating Discussion Activity on Community Forums , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[2]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[3]  Andrew Rosenberg,et al.  Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points , 2004, HLT-NAACL.

[4]  Wolfgang Nejdl,et al.  How useful are your comments?: analyzing and predicting youtube comments and comment ratings , 2010, WWW '10.

[5]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[7]  Benno Stein,et al.  Information Retrieval in the Commentsphere , 2012, TIST.

[8]  Jiangchuan Liu,et al.  Understanding the Characteristics of Internet Short Video Sharing: YouTube as a Case Study , 2007, ArXiv.

[9]  Vicenç Gómez,et al.  A likelihood-based framework for the analysis of discussion threads , 2012, World Wide Web.

[10]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[11]  Mao Ye,et al.  From user comments to on-line conversations , 2012, KDD.

[12]  Victoria L. Crittenden,et al.  We're all connected: The power of the social media ecosystem , 2011 .

[13]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[14]  Serge Fdida,et al.  Predicting the popularity of online articles based on user comments , 2011, WIMS '11.

[15]  Max Mühlhäuser,et al.  Automatically Assessing the Post Quality in Online Discussions on Software , 2007, ACL.

[16]  Matthew Rowe,et al.  Predicting Discussions on the Social Semantic Web , 2011, ESWC.

[17]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[18]  Ee-Peng Lim,et al.  Comments-oriented document summarization: understanding documents with readers' feedback , 2008, SIGIR '08.

[19]  Munmun De Choudhury,et al.  What makes conversations interesting?: themes, participants and consequences of conversations in online social media , 2009, WWW '09.

[20]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[21]  Jia Wang,et al.  User comments for news recommendation in forum-based social media , 2010, Inf. Sci..

[22]  Andrea Esuli,et al.  Automatic generation of lexical resources for opinion mining: models, algorithms and applications , 2010, SIGF.

[23]  Keith B. Hall,et al.  Improved video categorization from text metadata and user comments , 2011, SIGIR '11.

[24]  Milam W. Aiken,et al.  Flaming in electronic communication , 2004, Decis. Support Syst..

[25]  Nuria Oliver,et al.  Leveraging user comments for aesthetic aware image search reranking , 2012, WWW.

[26]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Bruno S. Silvestre,et al.  Social Media? Get Serious! Understanding the Functional Building Blocks of Social Media , 2011 .

[29]  Fang Wu,et al.  How Public Opinion Forms , 2008, WINE.

[30]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[31]  James Caverlee,et al.  Ranking Comments on the Social Web , 2009, 2009 International Conference on Computational Science and Engineering.

[32]  Rajeev Rastogi,et al.  Semi-supervised correction of biased comment ratings , 2012, WWW.

[33]  Jon M. Kleinberg,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Opinions How Opinions are Received by Online Communities: A Case Study on Amazon.com Helpfulness Votes , 2022 .

[34]  Ismail Sengör Altingövde,et al.  Can Social Features Help Learning to Rank YouTube Videos? , 2012, WISE.

[35]  A. Johnston,et al.  Press Bias and Politics: How the Media Frame Controversial Issues , 2003 .

[36]  Ophir Frieder,et al.  Are Web User Comments Useful for Search? , 2009, LSDS-IR@SIGIR.

[37]  Christian Bauckhage,et al.  The slashdot zoo: mining a social network with negative edges , 2009, WWW.

[38]  Jungwoo Kim,et al.  The politics of comments: predicting political orientation of news stories with commenters' sentiment patterns , 2011, CSCW.

[39]  Noah A. Smith,et al.  What's Worthy of Comment? Content and Comment Volume in Political Blogs , 2010, ICWSM.

[40]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[41]  Vicenç Gómez,et al.  Statistical analysis of the social network and discussion threads in slashdot , 2008, WWW.

[42]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[43]  Gilad Mishne,et al.  Leave a Reply: An Analysis of Weblog Comments , 2006 .

[44]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[45]  Yue Lu,et al.  Rated aspect summarization of short comments , 2009, WWW '09.

[46]  Yehuda Koren,et al.  Care to comment?: recommendations for commenting on news stories , 2012, WWW.

[47]  Sheizaf Rafaeli,et al.  Predictors of answer quality in online Q&A sites , 2008, CHI.

[48]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[49]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[50]  Mark Sanderson,et al.  Automatic video tagging using content redundancy , 2009, SIGIR.

[51]  Yong Tan,et al.  Social Networks and the Diffusion of User-Generated Content: Evidence from YouTube , 2012, Inf. Syst. Res..

[52]  Wagner Meira,et al.  Automatic Moderation of Comments in a Large On-line Journalistic Environment , 2007, ICWSM.

[53]  Maarten de Rijke,et al.  Extracting the discussion structure in comments on news-articles , 2007, WIDM '07.

[54]  Subhajit Sanyal,et al.  Multi-objective ranking of comments on web , 2012, WWW.

[55]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[56]  Mike Thelwall,et al.  Commenting on YouTube videos: From guatemalan rock to El Big Bang , 2012, J. Assoc. Inf. Sci. Technol..