Towards a Multi-Stage Approach to Detect Privacy Breaches in Physician Reviews

Physician Review Websites allow users to evaluate their experiences with health services. As these evaluations are regularly contextualized with facts from users’ private lives, they often accidentally disclose personal information on the Web. This poses a serious threat to users’ privacy. In this paper, we report on early work in progress on “Text Broom”, a tool to detect privacy breaches in user-generated texts. For this purpose, we conceptualize a pipeline which combines methods of Natural Language Processing such as Named Entity Recognition, linguistic patterns and domain-specific Machine Learning approaches which have the potential to recognize privacy violations with wide coverage. A prototypical web application is openly accesible.