AlphaSum: Size-Constrained Table Summarization using

Consider a scientist who wants to explore multiple data sets to select the relevant ones for further analysis. Since the vi- sualization real estate may put a stringent constraint on how much detail can be presented to this user in a single page, ef- fective table summarization techniques are needed to create summaries that are both sufficiently small and effective in communicating the available content. In this paper, we first argue that table summarization can benefit from knowledge about acceptable value clustering alternatives for clustering the values in the database. We formulate the problem of table summarization with the help of value lattices. We then provide a framework to express alternative clustering strategies and to account for various utility measures (such as information loss) in assessing different summarization al- ternatives. Based on this interpretation, we introduce three preference criteria, max-min-util (cautious), max-sum-util (cumulative), and pareto-util, for the problem of table sum- marization. To tackle with the inherent complexity, we rely on the properties of the fuzzy interpretation to further de- velop a novel ranked set cover based evaluation mechanism (RSC). These are brought together in an AlphaSum, table summarization system. Experimental evaluations showed that RSC improves both execution times and the summary qualities in AlphaSum, by pruning the search space more effectively than the existing solutions.

[1]  Alberto O. Mendelzon,et al.  Reasoning about Summarizability in Heterogeneous Multidimensional Schemas , 2001, ICDT.

[2]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[3]  Rayner Alfred,et al.  Data Summarization Approach to Relational Domain Learning Based on Frequent Pattern to Support the Development of Decision Making , 2006, ADMA.

[4]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[6]  Francesco Buccafurri,et al.  A quad-tree based multiresolution approach for two-dimensional summary data , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[7]  Philip S. Yu,et al.  Dynamic refinement of table summarization for M-commerce , 2002, Proceedings Fourth IEEE International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS 2002).

[8]  Jun'ichi Tatemura,et al.  Supporting OLAP operations over imperfectly integrated taxonomies , 2008, SIGMOD Conference.

[9]  Philip S. Yu,et al.  TabSum: a flexible and dynamic table summarization approach , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[10]  Chinya V. Ravishankar,et al.  Relational database compression using augmented vector quantization , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[11]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[12]  Alfredo Cuzzocrea,et al.  A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes , 2006, DaWaK.

[13]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[14]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[15]  Alberto O. Mendelzon,et al.  Capturing summarizability with integrity constraints in OLAP , 2005, TODS.

[16]  Kenneth Ward Church,et al.  Engineering the compression of massive tables: an experimental approach , 2000, SODA '00.

[17]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[18]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[19]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[20]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[21]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[22]  K. Selçuk Candan,et al.  FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources , 2007, SIGMOD '07.

[23]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[24]  Torben Bach Pedersen,et al.  Supporting imprecision in multidimensional databases using granularities , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[25]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[26]  Raffaele Giancarlo,et al.  Improving table compression with combinatorial optimization , 2002, SODA '02.

[27]  Noureddine Mouaddib,et al.  General Purpose Database Summarization , 2005, VLDB.

[28]  Alfredo Cuzzocrea,et al.  Hand-OLAP: a system for delivering OLAP services on handheld devices , 2003, The Sixth International Symposium on Autonomous Decentralized Systems, 2003. ISADS 2003..

[29]  Noureddine Mouaddib,et al.  Database Summarization: The SaintEtiQ System , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  K. Selçuk Candan,et al.  Integrating and querying taxonomies with quest in the presence of conflicts , 2007, SIGMOD '07.