Social media research: The application of supervised machine learning in organizational communication research

Despite the online availability of data, analysis of this information in academic research is arduous. This article explores the application of supervised machine learning (SML) to overcome challenges associated with online data analysis. In SML classifiers are used to categorize and code binary data. Based on a case study of Dutch employees' work-related tweets, this paper compares the coding performance of three classifiers, Linear Support Vector Machine, Naive Bayes, and logistic regression. The performance of these classifiers is assessed by examining accuracy, precision, recall, the area under the precision-recall curve, and Krippendorf's Alpha. These indices are obtained by comparing the coding decisions of the classifier to manual coding decisions. The findings indicate that the Linear Support Vector Machine and Naive Bayes classifiers outperform the logistic regression classifier. This study also compared the performance of these classifiers based on stratified random samples and random samples of training data. The findings indicate that in smaller training sets stratified random training samples perform better than random training samples, in large training sets (nź=ź4000) random samples yield better results. Finally, the Linear Support Vector Machine classifier was trained with 4000 tweets and subsequently used to categorize 578,581 tweets obtained from 430 employees. Supervised Machine Learning (SML) is suitable for coding social media content.Linear Support Vector Machine and Naive Bayes classifiers can be trained using 4000 training tweets.SML enables researchers to escalate the scope of their research without compromising data size or depth.Linear Support Vector Machine and Naive Bayes outperform the logistic regression classifier.Classifiers perform better based on stratified random samples compared to random samples when training samples are small.

[1]  Sonja Dreher Social media and the world of work : A strategic approach to employees’ participation in social media , 2014 .

[2]  M. Lombard,et al.  Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability , 2002 .

[3]  Stephen Lacy,et al.  Analyzing Media Messages , 2019 .

[4]  S. Utz,et al.  Is the medium the message? Perceptions of and reactions to crisis communication on twitter, blogs and traditional media , 2011 .

[5]  Dennis A. Gioia,et al.  Mapping Strategic Thought. , 1992 .

[6]  Colin Seymour-Ure,et al.  Content Analysis in Communication Research. , 1972 .

[7]  Richard D. Waters,et al.  Engaging stakeholders through social networking: How nonprofit organizations are using Facebook , 2009 .

[8]  David J. Faulds,et al.  Social media: The new hybrid element of the promotion mix , 2009 .

[9]  Ward van Zoonen,et al.  How employees use Twitter to talk about work: A typology of work-related tweets , 2016, Comput. Hum. Behav..

[10]  Mariam El Ouirdi,et al.  Employees' use of social media technologies: a methodological and thematic review , 2015, Behav. Inf. Technol..

[11]  Michael D. Smith,et al.  Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection , 2006, WEBKDD.

[12]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[13]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[14]  E. Bucher,et al.  The stress potential of social media in the workplace , 2013 .

[15]  W. R. Neuman,et al.  The Dynamics of Public Attention: Agenda‐Setting Theory Meets Big Data , 2014 .

[16]  Michael Scharkow,et al.  Thematic content analysis using supervised machine learning: An empirical evaluation using German online news , 2011, Quality & Quantity.

[17]  O. Holsti Content Analysis for the Social Sciences and Humanities , 1969 .

[18]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[19]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20]  Richard N. Landers,et al.  Validation of the Beneficial and Harmful Work-Related Social Media Behavioral Taxonomies , 2014 .

[21]  Claes H. de Vreese,et al.  Using Supervised Machine Learning to Code Policy Issues , 2015 .

[22]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[23]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[24]  Richard D. Waters Nonprofit organizations' use of the internet: A content analysis of communication trends on the internet sites of the philanthropy 400 , 2007 .

[25]  Stuart Soroka,et al.  Affective News: The Automated Coding of Sentiment in Political Texts , 2012 .

[26]  Danah Boyd,et al.  I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..

[27]  Justin M. Berg,et al.  in Cyberspace : How Boundary Work in Online Social Networks Impacts Professional Relationships , 2013 .

[28]  Daan Odijk,et al.  Teaching the Computer to Code Frames in News: Comparing Two Supervised Machine Learning Approaches to Frame Analysis , 2014 .

[29]  Eyun-Jung Ki,et al.  Situational crisis communication and interactivity: Usage and effectiveness of Facebook for crisis management by Fortune 500 companies , 2014, Comput. Hum. Behav..

[30]  Youngshin Hong,et al.  Netizens’ evaluations of corporate social responsibility: Content analysis of CSR news stories and online readers’ comments , 2009 .

[31]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[32]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[33]  Chih-Jen Lin,et al.  Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.

[34]  John Gallaugher,et al.  Social Media and Customer Dialog Management at Starbucks , 2010, MIS Q. Executive.

[35]  P. Leonardi,et al.  Social Media Use in Organizations: Exploring the Affordances of Visibility, Editability, Persistence, and Association , 2013 .

[36]  Hallvard Moe,et al.  PUBLIC SERVICE NEWS ON THE WEB , 2012 .

[37]  Loet Leydesdorff,et al.  Implicit media frames: Automated analysis of public debate on artificial sweeteners , 2010, Public understanding of science.

[38]  Karla K. Gower,et al.  How do the news media frame crises? A content analysis of crisis news coverage , 2009 .

[39]  Jeremy C. Short,et al.  The Application of DICTION to Content Analysis Research in Strategic Management , 2008 .

[40]  Alfred Hermida,et al.  Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods , 2013 .

[41]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[42]  Trent Seltzer,et al.  Dialogic communication in 140 characters or less: How Fortune 500 companies engage stakeholders usin , 2010 .

[43]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[44]  Giovanna Campopiano,et al.  Corporate Social Responsibility Reporting: A Content Analysis in Family and Non-family Firms , 2014, Journal of Business Ethics.

[45]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[46]  Gregory D. Saxton,et al.  Information, Community, and Action: How Nonprofit Organizations Use Social Media , 2011, J. Comput. Mediat. Commun..

[47]  Tamara A. Small WHAT THE HASHTAG? , 2011 .

[48]  S. Helm Employees' awareness of their impact on corporate reputation , 2011 .

[49]  Rhonda K. Reger,et al.  A Content Analysis of the Content Analysis Literature in Organization Studies: Research Themes, Data Sources, and Methodological Refinements , 2007 .

[50]  Richard D. Waters,et al.  Tweet, tweet, tweet: A content analysis of nonprofit organizations Twitter updates , 2011 .

[51]  Michail N. Giannakos,et al.  Using social media for work: Losing your time or improving your work? , 2014, Comput. Hum. Behav..

[52]  Gregory D. Saxton,et al.  Engaging Stakeholders Through Twitter: How Nonprofit Organizations Are Getting More Out of 140 Characters or Less , 2010, ArXiv.

[53]  Wouter van Atteveldt,et al.  Parsing, Semantic Networks, and Political Authority Using Syntactic Analysis to Extract Semantic Relations from Dutch Newspaper Articles , 2008, Political Analysis.

[54]  Ward van Zoonen,et al.  The Importance of Source and Credibility Perception in Times of Crisis: Crisis Communication in a Socially Mediated Era , 2015 .

[55]  Dustin Hillard,et al.  Computer-Assisted Topic Classification for Mixed-Methods Social Science Research , 2008 .

[56]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[57]  M. Mark Miller,et al.  Frame Mapping and Analysis of News Coverage of Contentious Issues , 1997 .

[58]  Balachander Krishnamurthy,et al.  Historicizing New Media: A Content Analysis of Twitter , 2013 .