Data Mining: The Next Generation (Data Mining: Die nächste Generation)

Summary Data Mining has enjoyed great popularity in recent years, with advances in both research and commercialization. The first generation of data mining research and development has yielded several commercially available systems, both stand-alone and integrated with database systems, produced scalable versions of algorithms for many classical data mining problems and introduced novel pattern discovery problems. In July 2004 researchers from a variety of backgrounds assembled at the Dagstuhl Conference Center in Germany for a workshop to re-assess the current directions of the field, to identify critical problems that require attention, and to discuss ways to increase the flow of ideas across the different disciplines that Data Mining has brought together. The workshop did not seek to draw up an agenda for the field of data mining. Rather, it offers the participants' perspective on two technical directions – compositionality and privacy – and describes some important application challenges which drove the discussion.

[1]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[2]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[3]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[4]  Ivan P. Fellegi,et al.  On the Question of Statistical Confidentiality , 1972 .

[5]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[6]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[7]  Ulf Leser,et al.  Systematic feature evaluation for gene name recognition , 2005, BMC Bioinformatics.

[8]  S. Fienberg,et al.  Bounding Entries in Multi-way Contingency Tables Given a Set of Marginal Totals , 2003 .

[9]  Jeremy Buhler,et al.  Designing seeds for similarity search in genomic DNA , 2003, RECOMB '03.

[10]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[11]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[12]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[13]  Shinichi Morishita,et al.  Constrained clusters of gene expression profiles with pathological features , 2004, Bioinform..

[14]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[15]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[16]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[17]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[18]  C. Papadimitriou,et al.  On the value of private information , 2001 .

[19]  Avanidhar Subrahmanyam,et al.  The Value of Private Information , 2005 .

[20]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[21]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[22]  Toshihisa Takagi,et al.  Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. , 2003, Genome research.

[23]  Adrian Dobra,et al.  Assessing the Risk of Disclosure of Confidential Categorical Data , 2002 .

[24]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[25]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[26]  Stephen E. Fienberg,et al.  Bounds for Cell Entries in Two-Way Tables Given Conditional Relative Frequencies , 2004, Privacy in Statistical Databases.

[27]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001 .

[28]  Ron Shamir,et al.  Scoring clustering solutions by their biological relevance , 2003, Bioinform..

[29]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.