论文信息 - Towards a Multi-Stage Approach to Detect Privacy Breaches in Physician Reviews

Towards a Multi-Stage Approach to Detect Privacy Breaches in Physician Reviews

Physician Review Websites allow users to evaluate their experiences with health services. As these evaluations are regularly contextualized with facts from users’ private lives, they often accidentally disclose personal information on the Web. This poses a serious threat to users’ privacy. In this paper, we report on early work in progress on “Text Broom”, a tool to detect privacy breaches in user-generated texts. For this purpose, we conceptualize a pipeline which combines methods of Natural Language Processing such as Named Entity Recognition, linguistic patterns and domain-specific Machine Learning approaches which have the potential to recognize privacy violations with wide coverage. A prototypical web application is openly accesible.

Frederik Simon Bäumer | Joschka Kersting | Michaela Geierhos | Matthias Orlikowski

[1] Christopher Krügel,et al. Abusing Social Networks for Automated User Profiling , 2010, RAID.

[2] Bennett Kleinberg,et al. Web-based text anonymization with Node.js: Introducing NETANOS (Named entity-based Text Anonymization for Open Science) , 2017, J. Open Source Softw..

[3] Shuying Shen,et al. BoB, a best-of-breed automated text de-identification system for VHA clinical documents , 2013, J. Am. Medical Informatics Assoc..

[4] Fabian Prasser,et al. SafePub: A Truthful Data Anonymization Algorithm With Strong Privacy Guarantees , 2018, Proc. Priv. Enhancing Technol..

[5] Carmela Troncoso,et al. You cannot hide for long: de-anonymization of real-world dynamic behaviour , 2013, WPES.

[6] Ariel Stolerman,et al. Doppelgänger Finder: Taking Stylometry to the Underground , 2014, 2014 IEEE Symposium on Security and Privacy.

[7] Frederik Simon Bäumer,et al. Privacy Matters: Detecting Nocuous Patient Data Exposure in Online Physician Reviews , 2017, ICIST.