论文信息 - Knowledge-Free Table Summarization

Knowledge-Free Table Summarization

Considering relational tables as the object of analysis, methods to summarize them can help the analyst to have a starting point to explore the data. Typically, table summarization aims at producing an informative data summary through the use of metadata supplied by attribute taxonomies. Nevertheless, such a hierarchical knowledge is not always available or may even be inadequate when existing. To overcome these limitations, we propose a new framework, named cTabSum, to automatically generate attribute value taxonomies and directly perform table summarization based on its own content. Our innovative approach considers a relational table as input and proceeds in a two-step way. First, a taxonomy for each attribute is extracted. Second, a new table summarization algorithm exploits the automatic generated taxonomies. An information theory measure is used to guide the summarization process. Associated with the new algorithm we also develop a prototype. Interestingly, our prototype incorporates some additional features to help the user familiarizing with the data: i the resulting summarized table produced by cTabSum can be used as recommended starting point to browse the data; ii some very easy-to-understand charts allow to visualize how taxonomies have been so built; iii finally, standard OLAP operators, i.e. drill-down and roll-up, have been implemented to easily navigate within the data set. In addition we also supply an objective evaluation of our table summarization strategy over real data.

[1] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[2] K. Selçuk Candan,et al. Reducing metadata complexity for faster table summarization , 2010, EDBT '10.

[3] Ian H. Witten,et al. Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[4] Ian Witten,et al. Data Mining , 2000 .

[5] Reema Thareja,et al. Data Warehousing , 2018, Encyclopedia of GIS.

[6] Noureddine Mouaddib,et al. General Purpose Database Summarization , 2005, VLDB.

[7] Aristides Gionis,et al. Assessing data mining results via swap randomization , 2007, TKDD.

[8] Ruggero G. Pensa,et al. From Context to Distance: Learning Dissimilarity for Categorical Data Clustering , 2012, TKDD.

[9] Maguelonne Teisseire,et al. Towards an automatic construction of Contextual Attribute-Value Taxonomies , 2012, SAC '12.

[10] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.

[11] Philip S. Yu,et al. Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12] Luca Chittaro,et al. Visualizing information on mobile devices , 2006, Computer.

[13] Philip S. Yu,et al. TabSum: a flexible and dynamic table summarization approach , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[14] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[15] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[17] S. Sumathi,et al. Data Warehousing, Data Mining, and OLAP , 2006 .

[18] A. Karr. Exploratory Data Mining and Data Cleaning , 2006 .

[19] Tamir Tassa,et al. k -Anonymization with Minimal Loss of Information , 2007, ESA.