Detecting Health-Related Privacy Leaks in Social Networks Using Text Mining Tools

In social media, especially in social networks, users routinely share personal information. In such sharing, they might inadvertently reveal some personal health information, an essential part of their private information. In this work, we present a tool for detection of personal health information (PHI) in a social network site, MySpace. We analyze the PHI with the use of two well-known medical resources MedDRA and SNOMED. We introduce a new measure – Risk Factor of Personal Information – that assesses a possibility of a term to disclose personal health information. We synthesize a profile of a potential PHI leak in a social network, and we demonstrate that this task benefits from the emphasis on the MedDRA and SNOMED terms.

[1]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[2]  Lynette Hirschman,et al.  Effects of personal identifier resynthesis on clinical text de-identification , 2010, J. Am. Medical Informatics Assoc..

[3]  Peng Liu,et al.  New threats to health data privacy , 2011, BMC Bioinformatics.

[4]  Elinore J. Kaufman,et al.  Content of Weblogs Written by Health Professionals , 2008, Journal of General Internal Medicine.

[5]  Emilie Renahy Recherche d'information en matière de santé sur Internet : déterminants, pratiques et impact sur la santé et le recours aux soins , 2008 .

[6]  R. Côté Systematized Nomenclature of Medicine , 1979 .

[7]  Wen Zhang,et al.  Role Prediction Using Electronic Medical Record System Audits , 2011, HealthSec.

[8]  Starr Roxanne Hiltz,et al.  Trust and Privacy Concern Within Social Networking Sites: A Comparison of Facebook and MySpace , 2007, AMCIS.

[9]  Laurence Balicco,et al.  Access to health information: going from professional to public practices , 2011 .

[10]  Catherine Tucker,et al.  Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records , 2009, Manag. Sci..

[11]  Kin Wah Fung,et al.  Can SNOMED CT fulfill the vision of a compositional terminology? Analyzing the use case for problem list. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[12]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[13]  John A. Carroll,et al.  Lexical Acquisition for Clinical Text Mining Using Distributional Similarity , 2012, CICLing.

[14]  Anthony Avery,et al.  Adverse Drug Reaction Reporting in the UK , 2010, Drug safety.

[15]  Guy Shani,et al.  Mining recommendations from the web , 2008, RecSys '08.

[16]  Sumaira Malik,et al.  Coping with infertility online: an examination of self-help mechanisms in an online infertility support group. , 2010, Patient education and counseling.

[17]  E. Larson,et al.  Dissemination of health information through social networks: twitter and antibiotics. , 2010, American journal of infection control.

[18]  Randy H. Katz,et al.  High speed deep packet inspection with hardware support , 2006 .

[19]  Kristina Star,et al.  Suspected Adverse Drug Reactions Reported For Children Worldwide , 2011, Drug safety.

[20]  D. Gruhl,et al.  Artist Ranking Through Analysis of On-line Community Comments , 2008 .