Text Analysis for Monitoring Personal Information Leakage on Twitter

Social networking services (SNSs) such as Twitter and Facebook can be considered as new forms of media. Information spreads much faster through social media than any other forms of traditional news media because people can upload information with no time and location constraints. For this reason, people have embraced SNSs and allowed them to become an integral part of their everyday lives. People express their emotional status to let others know how they feel about certain information or events. However, they are likely not only to share information with others but also to unintentionally expose personal information such as their place of residence, phone number, and date of birth. If such information is provided to users with inappropriate intentions, there may be serious consequences such as online and offline stalking. To prevent information leakages and detect spam, many researchers have monitored e- mail systems and web blogs. This paper considers text messages on Twitter, which is one of the most popular SNSs in the world, to reveal various hidden patterns by using several coefficient approaches. This paper focuses on users who exchange Tweets and examines the types of information that they reciprocate other's Tweets by monitoring samples of 50 million Tweets which were collected by Stanford University in November 2009. We chose an active Twitter user based on "happy birthday" rule and detecting their information related to place to live and personal names by using proposed coefficient method and compared with other coefficient approaches. As a result of this research, we can conclude that the proposed coefficient method is able to detect and recommend the standard English words for non-standard words in few conditions. Eventually, we detected 88,882 (24.287%) more name included Tweets and 14,054 (3.84%) location related Tweets compared by using only standard word matching method.

[1]  Seunghun Jin,et al.  A Personal Information Leakage Prevention Method on the Internet , 2006, 2006 IEEE International Symposium on Consumer Electronics.

[2]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[3]  Kuan-Ta Chen,et al.  Involuntary Information Leakage in Social Network Services , 2008, IWSEC.

[4]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[5]  Nikita Borisov,et al.  FlyByNight: mitigating the privacy risks of social networking , 2008, WPES '08.

[6]  Takayuki Sasaki,et al.  A Framework for Detecting Insider Threats using Psychological Triggers , 2012, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[7]  Shinsaku Kiyomoto,et al.  Model for a Common Notion of Privacy Leakage on Public Database , 2011, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[8]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[9]  Donald A. Jackson,et al.  Similarity Coefficients: Measures of Co-Occurrence and Association or Simply Measures of Occurrence? , 1989, The American Naturalist.

[10]  Calton Pu,et al.  Modeling Unintended Personal-Information Leakage from Multiple Online Social Networks , 2011, IEEE Internet Computing.

[11]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.

[12]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.