KDD deals with the ready data, available in all scientific and applied domains. However, some domains with comprehensive and conclusive data have severe data security problems. To exclude the reidentification risk of individual cases, e.g. persons or companies, the access to these data is rigidly restricted, and often KDD applications are not allowed at all. In this paper, we discuss data privacy issues based on our experience with some applications of the discovery system Explora and other data analysis approaches. At first, some examples of applications are presented referring to a simple classification organized according to two dimensions important for the privacy discussion. Then we treat the reidentification risk and discuss anonymization methods to overcome these problems. Aggregation and synthetization methods are discussed in more detail. There is a tradeoff between the reduction of the reidentification risk and the preservation of the statistical content of data. We analyse for some main KDD patterns, how far the statistical content of anonymized data is still sufficient. In principle, KDD needs aggregate events. Since the event space of a dataset is very large, a static precomputation of all possible events is often not viable. We propose an architectural solution of a modular KDD system including a separate data server handling also data security requirements and ensuring that only dynamically aggregated data leave the server and can be analysed by the discovery modules of the KDD system. Finally, some other data privacy aspects are addressed.
[1]
Willi Klösgen,et al.
Knowledge discovery in databases terminology
,
1996,
KDD 1996.
[2]
Willi Klösgen,et al.
Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora
,
1992,
Int. J. Intell. Syst..
[3]
Willi Klösgen,et al.
Explora: A Multipattern and Multistrategy Discovery Assistant
,
1996,
Advances in Knowledge Discovery and Data Mining.
[4]
Daniel E. O'Leary.
Some Privacy Issues in Knowledge Discovery: The OECD Personal Privacy Guidelines
,
1995,
IEEE Expert.
[5]
Marcel Holsheimer,et al.
Data Surveyor: Searching the Nuggets in Parallel
,
1996,
Advances in Knowledge Discovery and Data Mining.
[6]
Daniel E. O'Leary,et al.
Knowledge Discovery as a Threat to Database Security
,
1991,
Knowledge Discovery in Databases.
[7]
Harold W. Watts,et al.
AN INVESTIGATION OF THE CONSEQUENCES OF PARTIAL AGGREGATION OF MICRO-ECONOMIC DATA'
,
1972
.