Anonimytext: Anonimization of Unstructured Documents

The anonymization of unstructured texts is nowadays a task of great importance in several text mining applications. Medical records anonymization is needed both to preserve personal health information privacy and enable further data mining efforts. The described ANONYMITEXT system is designed to de-identify sensible data from unstructured documents. It has been applied to Spanish clinical notes to recognize sensible concepts that would need to be removed if notes are used beyond their original scope. The system combines several medical knowledge resources with semantic clinical notes induced dictionaries. An evaluation of the semi-automatic process has been carried on a subset of the clinical notes on the most frequent attributes.

[1]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[2]  César de Pablo-Sánchez,et al.  Building a Graph of Names and Contextual Patterns for Named Entity Classification , 2009, ECIR.

[3]  Li Li,et al.  Viewpoint Paper: Repurposing the Clinical Record: Can an Existing Natural Language Processing System De-identify Clinical Notes? , 2009, J. Am. Medical Informatics Assoc..

[4]  J. Gilbertson,et al.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. , 2004, American journal of clinical pathology.

[5]  Zbigniew W. Ras,et al.  Ensuring Data Security Against Knowledge Discovery in Distributed Information Systems , 2005, RSFDGrC.

[6]  Christophe De Cannière,et al.  Finding SHA-1 Characteristics: General Results and Applications , 2006, ASIACRYPT.

[7]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[8]  K. Ohe,et al.  Automatic Deidentification by using Sentence Features and Label Consistency , 2006 .

[9]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[10]  Róbert Busa-Fekete,et al.  State-of-the-art anonymization of medical records using an iterative machine learning framework. , 2007 .

[11]  Robert H. Baud,et al.  Medical document anonymization with a semantic lexicon , 2000, AMIA.

[12]  José Carlos González,et al.  STILUS: Sistema de revisión lingüistica de textos en castellano , 2002, Proces. del Leng. Natural.