On the Declassification of Confidential Documents

We introduce the anonymization of unstructured documents to settle the base of automatic declassification of confidential documents. Departing from known ideas and methods of data privacy, we introduce the main issues of unstructured document anonymization and propose the use of named entity recognition techniques from natural language processing and information extraction to identify the entities of the document that need to be protected.

[1]  Vicenç Torra,et al.  Towards Semantic Microaggregation of Categorical Data for Confidential Documents , 2010, MDAI.

[2]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[3]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[4]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Vicenç Torra,et al.  Rank Swapping for Partial Orders and Continuous Variables , 2009, 2009 International Conference on Availability, Reliability and Security.

[6]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases , 2002, Lecture Notes in Computer Science.

[7]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[8]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[9]  Vicenç Torra,et al.  Constrained Microaggregation: Adding Constraints for Data Editing , 2008, Trans. Data Priv..

[10]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[11]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Josep Domingo-Ferrer,et al.  Privacy in Data Mining , 2005, Data Mining and Knowledge Discovery.

[13]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[14]  Satoshi Sekine,et al.  Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy , 2004, LREC.

[15]  Vicenç Torra,et al.  Privacy-preserving data-mining through micro-aggregation for web-based e-commerce , 2010, Internet Res..

[16]  Ninghui Li,et al.  Towards optimal k-anonymization , 2008, Data Knowl. Eng..

[17]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[18]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[21]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[22]  David Sánchez,et al.  Ontology-Based Anonymization of Categorical Values , 2010, MDAI.

[23]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[24]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[25]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.