Ineluctable background checking on social networks: Linking job seeker's résumé and posts

A growing source of concern is that the privacy of individuals can be violated by linking information from multiple sources. For example, the linking of a person's anonymized information with other information about that person can lead to de-anonymization of the person. To investigate the social risks of such linking, we investigated the use of social networks for background checking, which is the process of evaluating the qualifications of job seekers, and evaluated the risk posed by the linking of information the employer already has with information on social networks. After clarifying the risk, we developed a system that links information from different sources: information extracted from a job seeker's résumé and anonymous posts on social networks. The system automatically calculates the similarity between information in the résumé and in the posts, and identifies the job seeker's social network accounts even though the profiles may have been anonymized. As a part of our system, we developed a novel method for quantifying the implications of terms in a résumé by using the posts on social networks. In an evaluation using the résumés of two job seekers and the tweets of 100 users, the system identified the accounts of both job seekers with reasonably good accuracy (true positive rate of 0.941 and true negative rate of 0.999). These findings reveal the real social threat of linking information from different sources. Our research should thus form the basis for further study of the relationship between privacy in social networks and the freedom to express opinions.

[1]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[2]  Taku Kudo,et al.  MeCab : Yet Another Part-of-Speech and Morphological Analyzer , 2005 .

[3]  Kuan-Ta Chen,et al.  Involuntary Information Leakage in Social Network Services , 2008, IWSEC.

[4]  Krishna P. Gummadi,et al.  Analyzing facebook privacy settings: user expectations vs. reality , 2011, IMC '11.

[5]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[6]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[7]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[8]  Xin Shuai,et al.  Loose tweets: an analysis of privacy leaks on twitter , 2011, WPES.

[9]  Keith W. Ross,et al.  Facebook users have become much more private: A large-scale study , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications Workshops.

[10]  Evangelos P. Markatos,et al.  Using social networks to harvest email addresses , 2010, WPES '10.

[11]  Levente Buttyán,et al.  A machine learning based approach for predicting undisclosed attributes in social networks , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications Workshops.

[12]  Jasmine Novak,et al.  Anti-aliasing on the web , 2004, WWW '04.