General Purpose Database Summarization

In this paper, a message-oriented architecture for large database summarization is presented. The summarization system takes a database table as input and produces a reduced version of this table through both a rewriting and a generalization process. The resulting table provides tuples with less precision than the original but yet are very informative of the actual content of the database. This reduced form can be used as input for advanced data mining processes as well as some specific application presented in other works. We describe the incremental maintenance of the summarized table, the system capability to directly deal with XML database systems, and finally scalability which allows it to handle very large datasets of a million record.

[1]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[2]  Lotfi A. Zadeh,et al.  The concept of a linguistic variable and its application to approximate reasoning-III , 1975, Inf. Sci..

[3]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[4]  Ronald R. Yager,et al.  A new approach to the summarization of data , 1982, Inf. Sci..

[5]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[6]  Rajeev Rastogi,et al.  SPARTAN: a model-based semantic compression system for massive data tables , 2001, SIGMOD '01.

[7]  Jiawei Han,et al.  DBLearn: a system prototype for knowledge discovery in relational databases , 1994, SIGMOD '94.

[8]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[9]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[10]  Laks V. S. Lakshmanan,et al.  SOCQET: semantic OLAP with compressed cube and summarization , 2003, SIGMOD '03.

[11]  Anthony K. H. Tung,et al.  ItCompress: an iterative semantic compression algorithm , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  H. V. Jagadish,et al.  Semantic Compression and Pattern Extraction with Fascicles , 1999, VLDB.

[13]  Michael Spann,et al.  A new approach to clustering , 1990, Pattern Recognit..

[14]  G. Raschia,et al.  Mining a commercial banking data set: the saintetiq approach , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[15]  Hamid Pirahesh,et al.  Efficiently publishing relational data as XML documents , 2001, The VLDB Journal.

[16]  Adrian Walker,et al.  On Retrieval from a Small Version of a Large Data Base , 1980, VLDB.

[17]  Noureddine Mouaddib,et al.  Querying the SaintEtiQ Summaries - A First Attempt , 2004, FQAS.

[18]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[19]  Noureddine Mouaddib,et al.  Image Database Summarization with the SaintEtiQ System , 2002 .

[20]  A. Tversky Features of Similarity , 1977 .