A selective data retention approach in massive databases

Exponentially growing databases have been tackled on two basic fronts: technological and methodological. Technology offered solution in storage capacity, processing power, and access speed. Among the methodologies are indexing, views, data mining, and temporal databases, and combinations of technology and methodology come in the form of data warehousing, all designed to get the most out of and best handle mounting and complex databases. The basic premise that underlines those approaches is to store everything. We challenge that premise suggesting a selective retention approach for operational data thus curtailing the size of databases and warehouses without losing content and information value. A model and methodology for selective data retention are introduced. The model, using cost/benefit analysis, allows assessing data elements currently stored in the database as well as providing a retention policy regarding current and prospective data. An example case study on commercial data illustrates the model and concepts of such method.

[1]  D. Madigan,et al.  Proceedings : KDD-99 : the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 15-18, 1999, San Diego, California, USA , 1999 .

[2]  R. Schonberger Japanese manufacturing techniques : nine hidden lessons simplicity , 1982 .

[3]  W. H. Inmon,et al.  Building the data warehouse (2nd ed.) , 1996 .

[4]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[5]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[6]  Ananth Grama,et al.  Data Mining: From Serendipity to Science - Guest Editors' Introduction , 1999, Computer.

[7]  Rolf Stadler,et al.  Discovering Data Mining: From Concept to Implementation , 1997 .

[8]  Israel Spiegler,et al.  Information as inventory: A new conceptual view , 1991, Inf. Manag..

[9]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[10]  Barbara Dinter,et al.  The OLAP market: state of the art and research issues , 1998, DOLAP '98.

[11]  Israel Spiegler,et al.  Knowledge Management: A New Idea Or a Recycled Concept? , 2000, Commun. Assoc. Inf. Syst..

[12]  Boaz Ronen,et al.  A topology of financial versus manufacturing management information systems , 1988 .

[13]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[14]  Pieter Adriaans,et al.  Data mining , 1996 .

[15]  Elke A. Rundensteiner,et al.  View materialization techniques for complex hierarchical objects , 1997, CIKM '97.

[16]  Ramez Elmasri,et al.  Temporal database modeling: an object-oriented approach , 1993, CIKM '93.

[17]  Paul Gray,et al.  Special Section: Data Mining , 1999, J. Manag. Inf. Syst..

[18]  Niv Ahituv,et al.  A Systematic Approach Toward Assessing the Value of an Information System , 1980, MIS Q..

[19]  Kevin Cox A unified approach to indexing and retrieval of information , 1994, SIGDOC '94.

[20]  Jennifer Widom,et al.  On-line warehouse view maintenance , 1997, SIGMOD '97.

[21]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[22]  Shashi K. Gadia,et al.  A generalized model for a relational temporal database , 1988, SIGMOD '88.

[23]  Ambuj K. Singh,et al.  Efficient view maintenance at data warehouses , 1997, SIGMOD '97.

[24]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.