Analyzing User Comments on YouTube Coding Tutorial Videos

Video coding tutorials enable expert and noviceprogrammers to visually observe real developers write, debug, and execute code. Previous research in this domain has focusedon helping programmers find relevant content in coding tutorialvideos as well as understanding the motivation and needs ofcontent creators. In this paper, we focus on the link connectingprogrammers creating coding videos with their audience. Morespecifically, we analyze user comments on YouTube codingtutorial videos. Our main objective is to help content creators toeffectively understand the needs and concerns of their viewers, thus respond faster to these concerns and deliver higher-qualitycontent. A dataset of 6000 comments sampled from 12 YouTubecoding videos is used to conduct our analysis. Important userquestions and concerns are then automatically classified andsummarized. The results show that Support Vector Machinescan detect useful viewers' comments on coding videos with anaverage accuracy of 77%. The results also show that SumBasic, an extractive frequency-based summarization technique withredundancy control, can sufficiently capture the main concernspresent in viewers' comments.

[1]  David C. DeAndrea,et al.  The Influence of Online Comments on Perceptions of Antimarijuana Public Service Announcements on YouTube , 2010 .

[2]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[3]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[4]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5]  Traian Rebedea,et al.  Relevance-Based Ranking of Video Comments on YouTube , 2013, 2013 19th International Conference on Control Systems and Computer Science.

[6]  James Caverlee,et al.  Summarizing User-Contributed Comments , 2011, ICWSM.

[7]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[9]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10]  S. Burke,et al.  An Assessment of Faculty Usage of YouTube as a Teaching Resource , 2009 .

[11]  Martin C. Carlisle,et al.  Using You Tube to enhance student class preparation in an introductory Java course , 2010, SIGCSE.

[12]  Ashish Sureka Mining User Comment Activity for Detecting Forum Spammers in YouTube , 2011, ArXiv.

[13]  Benno Stein,et al.  Information Retrieval in the Commentsphere , 2012, TIST.

[14]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[15]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[16]  Wolfgang Nejdl,et al.  How useful are your comments?: analyzing and predicting youtube comments and comment ratings , 2010, WWW '10.

[17]  Christoph Treude,et al.  Blogging developer knowledge: Motivations, challenges, and future directions , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[18]  Virgílio A. F. Almeida,et al.  Identifying video spammers in online social networks , 2008, AIRWeb '08.

[19]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[20]  Jonathan Foster,et al.  What's the Issue Here?: Task-based Evaluation of Reader Comment Summarization Systems , 2016, LREC.

[21]  Claire Cardie,et al.  Properties, Prediction, and Prevalence of Useful User-Generated Comments for Descriptive Annotation of Social Media Objects , 2013, ICWSM.

[22]  Mike Thelwall,et al.  Commenting on YouTube videos: From guatemalan rock to El Big Bang , 2012, J. Assoc. Inf. Sci. Technol..

[23]  Clement Chau,et al.  YouTube as a participatory culture. , 2010, New directions for youth development.

[24]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[25]  Dustin J. Welbourne,et al.  Science communication on YouTube: Factors that affect channel and video popularity , 2016, Public understanding of science.

[26]  Gabriele Bavota,et al.  CodeTube: Extracting Relevant Fragments from Software Development Video Tutorials , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[27]  Vania Dimitrova,et al.  Identifying Relevant YouTube Comments to Derive Socially Augmented User Models: A Semantically Enriched Machine Learning Approach , 2011, UMAP Workshops.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[30]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[31]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[32]  Gao Cong,et al.  Topic-driven reader comments summarization , 2012, CIKM.

[33]  James D. Basham,et al.  Understanding STEM Education and Supporting Students through Universal Design for Learning , 2013 .

[34]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[35]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[36]  Claire Grover,et al.  Summarizing Newspaper Comments , 2014, ICWSM.

[37]  Michalis Faloutsos,et al.  A First Step Towards Understanding Popularity in YouTube , 2010, 2010 INFOCOM IEEE Conference on Computer Communications Workshops.

[38]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[39]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[40]  David Lo,et al.  How practitioners perceive the relevance of software engineering research , 2015, ESEC/SIGSOFT FSE.

[41]  Peter Duffy,et al.  Engaging the YouTube Google-Eyed Generation: Strategies for Using Web 2.0 in Teaching and Learning. , 2008 .

[42]  Margaret-Anne D. Storey,et al.  Code, Camera, Action: How Software Developers Document and Share Program Knowledge Using YouTube , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[43]  Ophir Frieder,et al.  Are Web User Comments Useful for Search? , 2009, LSDS-IR@SIGIR.

[44]  A. Clifton,et al.  Can YouTube enhance student nurse learning? , 2011, Nurse education today.

[45]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[46]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[47]  V. Dimitrova,et al.  Semantically Enriched Machine Learning Approach to Filter YouTube Comments for Socially Augmented User Models , 2011 .

[48]  Sloane C. Burke,et al.  YouTube: An Innovative Learning Resource for College Health Education Courses. , 2008 .

[49]  Carlos Costa,et al.  Learning computer programming: study of difficulties in learning programming , 2013, ISDOC.

[50]  Christy Desmet,et al.  Teaching Shakespeare with YouTube , 2009 .

[51]  Angela M. Dean,et al.  Design and analysis of experiment , 2013 .

[52]  Lauren Squires Enregistering internet language , 2010, Language in Society.