#DontTweetThis: Scoring Private Information in Social Networks

Abstract With the growing popularity of online social networks, a large amount of private or sensitive information has been posted online. In particular, studies show that users sometimes reveal too much information or unintentionally release regretful messages, especially when they are careless, emotional, or unaware of privacy risks. As such, there exist great needs to be able to identify potentially-sensitive online contents, so that users could be alerted with such findings. In this paper, we propose a context-aware, text-based quantitative model for private information assessment, namely PrivScore, which is expected to serve as the foundation of a privacy leakage alerting mechanism. We first solicit diverse opinions on the sensitiveness of private information from crowdsourcing workers, and examine the responses to discover a perceptual model behind the consensuses and disagreements. We then develop a computational scheme using deep neural networks to compute a context-free PrivScore (i.e., the “consensus” privacy score among average users). Finally, we integrate tweet histories, topic preferences and social contexts to generate a personalized context-aware PrivScore. This privacy scoring mechanism could be employed to identify potentially-private messages and alert users to think again before posting them to OSNs.

[1]  Rachel Greenstadt,et al.  Privacy Detective: Detecting Private Information and Collective Privacy Behavior in a Large Social Network , 2014, WPES.

[2]  Bobby Bhattacharjee,et al.  Persona: an online social network with user-defined privacy , 2009, SIGCOMM '09.

[3]  Melanie Volkamer,et al.  Explaining the privacy paradox: A systematic review of literature investigating privacy attitude and behavior , 2018, Comput. Secur..

[4]  Steven M. Bellovin,et al.  Facebook and privacy: it's complicated , 2012, SOUPS.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Adrienne Porter Felt,et al.  A Week to Remember: The Impact of Browser Warning Storage Policies , 2016, SOUPS.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Yoram Bachrach,et al.  Studying User Income through Language, Behaviour and Affect in Social Media , 2015, PloS one.

[9]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[10]  Yang Wang,et al.  From Facebook Regrets to Facebook Privacy Nudges , 2013 .

[11]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[12]  Krishna P. Gummadi,et al.  Forgetting in Social Media: Understanding and Controlling Longitudinal Exposure of Socially Shared Data , 2016, SOUPS.

[13]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[14]  Apostolis Zarras,et al.  Neuralyzer: Flexible Expiration Times for the Revocation of Online Data , 2016, CODASPY.

[15]  Ingemar J. Cox,et al.  Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language , 2016, ECIR.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Asimina Vasalou,et al.  Privacy dictionary: A new resource for the automated content analysis of privacy , 2011, J. Assoc. Inf. Sci. Technol..

[18]  Wenjing Xie,et al.  See you, see me: Teenagers' self-disclosure and regret of posting on social network site , 2015, Comput. Hum. Behav..

[19]  Chao Yang,et al.  Translating surveys to surveillance on social media: methodological challenges & solutions , 2014, WebSci '14.

[20]  Bogdan Carbunar,et al.  AbuSniff: Automatic Detection and Defenses Against Abusive Facebook Friends , 2018, ICWSM.

[21]  Jean-Gabriel Ganascia,et al.  FORPS: friends-oriented reputation privacy score , 2011, IWSEC 2011.

[22]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[23]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[24]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Prateek Mittal,et al.  EASiER: encryption-based access control in social networks with efficient revocation , 2011, ASIACCS '11.

[26]  Krishna P. Gummadi,et al.  Lethe: Conceal Content Deletion from Persistent Observers , 2018, Proc. Priv. Enhancing Technol..

[27]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[28]  Bo Luo,et al.  Content-Based Classification of Sensitive Tweets , 2017, Int. J. Semantic Comput..

[29]  J. Fodor,et al.  The Psychology of Language , 1974 .

[30]  Sunny Consolvo,et al.  An Experience Sampling Study of User Reactions to Browser Warnings in the Field , 2018, CHI.

[31]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[32]  J. Dawes Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales , 2008 .

[33]  Sameer Patil,et al.  Reasons, rewards, regrets: privacy considerations in location sharing as an interactive practice , 2012, SOUPS.

[34]  Lorrie Faith Cranor,et al.  Crying Wolf: An Empirical Study of SSL Warning Effectiveness , 2009, USENIX Security Symposium.

[35]  Lorrie Faith Cranor,et al.  The post that wasn't: exploring self-censorship on facebook , 2013, CSCW.

[36]  Jun-Ming Xu,et al.  An Examination of Regret in Bullying Tweets , 2013, HLT-NAACL.

[37]  Jeffrey Nichols,et al.  Home Location Identification of Twitter Users , 2014, TIST.

[38]  Eden Litt,et al.  Understanding social network site users' privacy tool use , 2013, Comput. Hum. Behav..

[39]  Lorrie Faith Cranor,et al.  You've been warned: an empirical study of the effectiveness of web browser phishing warnings , 2008, CHI.

[40]  James C. McElroy,et al.  The influence of personality on Facebook usage, wall postings, and regret , 2012, Comput. Hum. Behav..

[41]  Yang Wang,et al.  "I regretted the minute I pressed share": a qualitative study of regrets on Facebook , 2011, SOUPS.

[42]  Eszter Hargittai,et al.  “What Can I Really Do?” Explaining the Privacy Paradox with Online Apathy , 2016 .

[43]  Peng Liu,et al.  My Friend Leaks My Privacy: Modeling and Analyzing Privacy in Social Networks , 2018, SACMAT.

[44]  Zhenyu Liu,et al.  Inferring Privacy Information from Social Networks , 2006, ISI.

[45]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[46]  Susan B. Barnes,et al.  A privacy paradox: Social networking in the United States , 2006, First Monday.

[47]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[48]  Douglas W. Oard,et al.  On Predicting Deletions of Microblog Posts , 2015, CIKM.

[49]  Yang Wang,et al.  A field trial of privacy nudges for facebook , 2014, CHI.

[50]  Grant Blank,et al.  A New Privacy Paradox: Young People and Privacy on Social Network Sites , 2014 .

[51]  Virgílio A. F. Almeida,et al.  Beware of What You Share: Inferring Home Location in Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[52]  Adrienne Porter Felt,et al.  Where the Wild Warnings Are: Root Causes of Chrome HTTPS Certificate Errors , 2017, CCS.

[53]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[54]  Jianquan Liu,et al.  Co-occurrence prediction in a large location-based social network , 2013, Frontiers of Computer Science.

[55]  Joshua Fogel,et al.  Internet social network communities: Risk taking, trust, and privacy concerns , 2009, Comput. Hum. Behav..

[56]  Ninghui Li,et al.  End-User Privacy in Human–Computer Interaction , 2009 .

[57]  C. S. Andreassen,et al.  Do Online Privacy Concerns Predict Selfie Behavior among Adolescents, Young Adults and Adults? , 2017, Front. Psychol..

[58]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[59]  Kirstie Hawkey,et al.  On the challenges in usable security lab studies: lessons learned from replicating a study on SSL warnings , 2011, SOUPS.

[60]  Bo Luo,et al.  Classification of Private Tweets Using Tweet Content , 2017, 2017 IEEE 11th International Conference on Semantic Computing (ICSC).

[61]  Svitlana Volkova,et al.  On Predicting Sociodemographic Traits and Emotions from Communications in Social Networks and Their Implications to Online Self-Disclosure , 2015, Cyberpsychology Behav. Soc. Netw..

[62]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[63]  Michiaki Tatsubori,et al.  Location inference using microblog messages , 2012, WWW.

[64]  Evimaria Terzi,et al.  A Framework for Computing the Privacy Scores of Users in Online Social Networks , 2009, ICDM.

[65]  B. Krishnamurthy,et al.  How Much Is Too Much? Privacy Issues on Twitter , 2010 .

[66]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[67]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[68]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[69]  Nicholas Christakis,et al.  The Taste for Privacy: An Analysis of College Student Privacy Settings in an Online Social Network , 2008, J. Comput. Mediat. Commun..

[70]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[71]  Dongwon Lee,et al.  On Protecting Private Information in Social Networks: A Proposal , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[72]  Alessandro Acquisti,et al.  Tweets are forever: a large-scale quantitative analysis of deleted tweets , 2013, CSCW.

[73]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[74]  Yingshu Li,et al.  Collective Data-Sanitization for Preventing Sensitive Information Inference Attacks in Social Networks , 2018, IEEE Transactions on Dependable and Secure Computing.

[75]  Tamara Dinev,et al.  Internet Privacy Concerns and Social Awareness as Determinants of Intention to Transact , 2005, Int. J. Electron. Commer..

[76]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[77]  Jemal H. Abawajy,et al.  Privacy Threat Analysis of Mobile Social Network Data Publishing , 2017, ATCS/SePrIoT@SecureComm.

[78]  Peng Liu,et al.  Stalking online: on user privacy in social networks , 2012, CODASPY '12.

[79]  Keishi Tajima,et al.  Tweet classification based on their lifetime duration , 2012, CIKM.

[80]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[81]  Louise Guthrie,et al.  Document Classification By Machine: Theory and Practice , 1994, COLING.

[82]  Xin Shuai,et al.  Loose tweets: an analysis of privacy leaks on twitter , 2011, WPES.

[83]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[84]  Min Wu,et al.  Do security toolbars actually prevent phishing attacks? , 2006, CHI.

[85]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[86]  Martina Ziefle,et al.  Internet users' perceptions of information sensitivity - insights from Germany , 2019, Int. J. Inf. Manag..

[87]  Keke Chen,et al.  Tweet Properly: Analyzing Deleted Tweets to Understand and Identify Regrettable Ones , 2016, WWW.

[88]  Blase Ur,et al.  "i read my Twitter the next morning and was astonished": a conversational perspective on Twitter regrets , 2013, CHI.