Understanding User Behavior in Social Networks Using Quantified Moral Foundations

Moral inclinations expressed in user-generated content such as online reviews or tweets can provide useful insights to understand users’ behavior and activities in social networks, for example, to predict users’ rating behavior, perform customer feedback mining, and study users’ tendency to spread abusive content on these social platforms. In this work, we want to answer two important research questions. First, if the moral attributes of social network data can provide additional useful information about users’ behavior and how to utilize this information to enhance our understanding. To answer this question, we used the Moral Foundations Theory and Doc2Vec, a Natural Language Processing technique, to compute the quantified moral loadings of usergenerated textual contents in social networks. We used conditional relative frequency and the correlations between the moral foundations as two measures to study the moral break down of the social network data, utilizing a dataset of Yelp reviews and a dataset of tweets on abusive user-generated content. Our findings indicated that these moral features are tightly bound with users’ behavior in social networks. The second question we want to answer is if we can use the quantified moral loadings as new boosting features to improve the differentiation, classification, and prediction of social network activities. To test our hypothesis, we adopted our new moral features in a multi-class classification approach to distinguish hateful and offensive tweets in a labeled dataset, and compared with the baseline approach that only uses conventional text mining features such as tf-idf features, Part of Speech (PoS) tags, etc. Our findings demonstrated that the moral features improved the performance of the baseline approach in terms of precision, recall, and F-measure.

[1]  A Mayr,et al.  The Evolution of Boosting Algorithms , 2014, Methods of Information in Medicine.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Fengjun Li,et al.  Understanding rating behavior based on moral foundations: The case of Yelp reviews , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  Victor R. Prybutok,et al.  Latent Semantic Analysis: five methodological recommendations , 2012, Eur. J. Inf. Syst..

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  Marco Treiber Dynamic Programming (DP) , 2013 .

[7]  I-Hsien Ting,et al.  Content matters: A study of hate groups detection based on social networks analysis and web mining , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[8]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[9]  E. Krause,et al.  Taxicab Geometry: An Adventure in Non-Euclidean Geometry , 1987 .

[10]  Tommi S. Jaakkola,et al.  Word Embeddings as Metric Recovery in Semantic Spaces , 2016, TACL.

[11]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[12]  L. Breiman CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .

[13]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[14]  Jennifer Jie Xu,et al.  Mining communities and their relationships in blogs: A study of online hate groups , 2007, Int. J. Hum. Comput. Stud..

[15]  Kate M. Johnson,et al.  Morality Between the Lines : Detecting Moral Sentiment In Text , 2016 .

[16]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[17]  J. Haidt,et al.  Intuitive ethics: how innately prepared intuitions generate culturally variable virtues , 2004, Daedalus.

[18]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[21]  Ursula Hess,et al.  Testing moral foundation theory: Are specific moral emotions elicited by specific moral transgressions? , 2018 .

[22]  Scott Counts,et al.  Modeling Ideology and Predicting Policy Change with Social Media: Case of Same-Sex Marriage , 2015, CHI.

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[25]  Jonathan Haidt,et al.  Morality , 2008, Perspectives on psychological science : a journal of the Association for Psychological Science.

[26]  Brian A. Nosek,et al.  Liberals and conservatives rely on different sets of moral foundations. , 2009, Journal of personality and social psychology.

[27]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[28]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[29]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[30]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[31]  Larry Hatcher,et al.  JMP for Basic Univariate and Multivariate Statistics: A Step-by-step Guide , 2005 .

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Kate M. Johnson,et al.  Purity homophily in social networks. , 2016, Journal of experimental psychology. General.

[34]  Alice H. Oh,et al.  Do You Feel What I Feel? Social Aspects of Emotions in Twitter Conversations , 2012, ICWSM.

[35]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[36]  Thomas Demeester,et al.  Learning representations for tweets through word embeddings , 2016 .

[37]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[38]  Anne Kao,et al.  Natural Language Processing and Text Mining , 2006 .

[39]  C. Collins,et al.  STRATEGIC HUMAN RESOURCE PRACTICES, TOP MANAGEMENT TEAM SOCIAL NETWORKS, AND FIRM PERFORMANCE: THE ROLE OF HUMAN RESOURCE PRACTICES IN CREATING ORGANIZATIONAL COMPETITIVE ADVANTAGE , 2003 .

[40]  Rada Mihalcea,et al.  Values in Words: Using Language to Evaluate and Understand Personal Values , 2015, ICWSM.

[41]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[42]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[43]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[44]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[45]  Huan Liu,et al.  Text Analytics in Social Media , 2012, Mining Text Data.

[46]  Karel Jezek,et al.  Comparing Semantic Models for Evaluating Automatic Document Summarization , 2015, TSD.

[47]  Fabrício Benevenuto,et al.  Analyzing the Targets of Hate in Online Social Media , 2016, ICWSM.

[48]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[49]  Michael Netter,et al.  Taxonomy of social network data types , 2014, EURASIP J. Inf. Secur..

[50]  Derek Ruths,et al.  A Web of Hate: Tackling Hateful Speech in Online Social Spaces , 2017, ArXiv.

[51]  Jonathan Haidt,et al.  The Positive Emotion of Elevation , 2000 .

[52]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[53]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[54]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[55]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[56]  Swapna Somasundaran,et al.  Recognizing Stances in Ideological On-Line Debates , 2010, HLT-NAACL 2010.

[57]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[58]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[59]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[60]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[61]  D. Murphey,et al.  The Righteous Mind: Why Good People Are Divided by Politics and Religion , 2013 .

[62]  Eyal Sagi,et al.  Moral Rhetoric in Twitter: A Case Study of the U.S. Federal Shutdown of 2013 , 2014, CogSci.

[63]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[64]  Mohamed Medhat Gaber,et al.  A Survey of Data Mining Techniques for Social Media Analysis , 2013, J. Data Min. Digit. Humanit..

[65]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[66]  Arthur C. Graesser,et al.  The Right Threshold Value: What Is the Right Threshold of Cosine Measure When Using Latent Semantic Analysis for Evaluating Student Answers? , 2003, Int. J. Artif. Intell. Tools.

[67]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[68]  Huan Liu,et al.  Discovering Overlapping Groups in Social Media , 2010, 2010 IEEE International Conference on Data Mining.

[69]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[70]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[71]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[72]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[73]  Njagi Dennis Gitari,et al.  A Lexicon-based Approach for Hate Speech Detection , 2015, MUE 2015.

[74]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[75]  W. Kruskal Ordinal Measures of Association , 1958 .

[76]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[77]  J. Friedman Stochastic gradient boosting , 2002 .

[78]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[79]  Eyal Sagi,et al.  Measuring Moral Rhetoric in Text , 2014 .

[80]  Wael Hassan Gomaa,et al.  A Survey of Text Similarity Approaches , 2013 .

[81]  Amit P. Sheth,et al.  Cursing in English on twitter , 2014, CSCW.

[82]  Wu He,et al.  International Journal of Information Management Social Media Competitive Analysis and Text Mining: a Case Study in the Pizza Industry , 2022 .

[83]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[84]  Sasahara Kazutoshi,et al.  Quantifying moral foundations from various topics on Twitter conversations , 2016 .

[85]  Thomas Demeester,et al.  Learning Semantic Similarity for Very Short Texts , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[86]  Bo Zhao,et al.  PET: a statistical model for popular events tracking in social communities , 2010, KDD.

[87]  Zhi Xu,et al.  Filtering Offensive Language in Online Communities using Grammatical Relations , 2010 .

[88]  P. Rozin,et al.  The CAD triad hypothesis: a mapping between three moral emotions (contempt, anger, disgust) and three moral codes (community, autonomy, divinity). , 1999, Journal of personality and social psychology.

[89]  Alberto Barrón-Cedeño,et al.  Plagiarism Detection across Distant Language Pairs , 2010, COLING.

[90]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[91]  Philip C. Treleaven,et al.  Social media analytics: a survey of techniques, tools and platforms , 2014, AI & SOCIETY.

[92]  Stephanie M. Reich,et al.  Online and Offline Social Networks: Use of Social Networking Sites by Emerging Adults , 2008 .