Multi Level Mining of Warehouse Schema

The two mature disciplines, namely Data Mining and Data Warehousing have broadly the same set of objectives. Yet, they have developed largely separate from each other resulting in different techniques being used in each discipline. It has been recognized that mining techniques developed for pattern recognition such as Clustering and Visualization can assist in designing data warehouse schema. However, a suitable methodology is required for the seamless integration of mining methods in the design of warehouse schema. In previous work, we presented a methodology that employs hierarchical clustering to derive a tree structure that can be used by a data warehouse designer to build a schema. We believe that, in order to strengthen the decision making process, there is a strong need for a method that automatically extracts knowledge present at different levels of abstraction from a warehouse. We demonstrate with examples how mining at different levels of a hierarchical warehouse schema can give new insights about the underlying data cluster which not only helps in building more meaningful dimensions and facts for data warehouse design but can also improve the decision making process.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[3]  Dov Dori,et al.  From conceptual models to schemata: An object-process-based data warehouse construction method , 2008, Inf. Syst..

[4]  Marcos M. Campos,et al.  O-Cluster: scalable clustering of large high dimensional data sets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Simon Fong,et al.  Integrated Performance and Visualization Enhancement of OLAP Using Growing Self Organizing Neural Networks , 2010 .

[6]  Ben Shneiderman,et al.  Interactive color mosaic and dendrogram displays for signal/noise optimization in microarray data analysis , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7]  Almir Olivette Artero,et al.  Uncovering Clusters in Crowded Parallel Coordinates Visualizations , 2004 .

[8]  Luigi Palopoli,et al.  A novel three-level architecture for large data warehouses , 2002, J. Syst. Archit..

[9]  Simon Fong,et al.  A Conceptual Model for Combining Enhanced OLAP and Data Mining Systems , 2009, 2009 Fifth International Joint Conference on INC, IMS and IDC.

[10]  Matthew O. Ward,et al.  Mapping Nominal Values to Numbers for Effective Visualization , 2004, Inf. Vis..

[11]  Russel Pears,et al.  A methodology for integrating and exploiting data mining techniques in the design of data warehouses , 2010, 2010 6th International Conference on Advanced Information Management and Service (IMS).

[12]  Chung-Chian Hsu,et al.  Hierarchical clustering of mixed data based on distance hierarchy , 2007, Inf. Sci..

[13]  Sohail Asghar,et al.  Enhancing OLAP functionality using self-organizing neural networks , 2004 .

[14]  Jim X. Chen,et al.  Data visualization: parallel coordinates and dimension reduction , 2001, Comput. Sci. Eng..

[15]  Stefan Berchtold,et al.  Similarity clustering of dimensions for an enhanced visualization of multidimensional data , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[16]  Yuan An,et al.  SAMSTAR: An Automatic Tool for Generating Star Schemas from an Entity-Relationship Diagram , 2008, ER.

[17]  Jose-Norberto Mazón,et al.  WITHDRAWN: Designing OLAP schemata for data warehouses from conceptual models with MDA , 2010, DSS 2010.

[18]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[19]  Alok N. Choudhary,et al.  PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining , 2001, J. Parallel Distributed Comput..

[20]  Matthew O. Ward,et al.  XmdvTool: integrating multiple methods for visualizing multivariate data , 1994, Proceedings Visualization '94.

[21]  Brendan McCane,et al.  Distance functions for categorical and mixed variables , 2008, Pattern Recognit. Lett..

[22]  Haim Levkowitz,et al.  Enhanced High Dimensional Data Visualization through Dimension Reduction and Attribute Arrangement , 2006, Tenth International Conference on Information Visualisation (IV'06).

[23]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[24]  Yixiao Li,et al.  Clustering Mixed Data Based on Evidence Accumulation , 2006, ADMA.

[25]  R. Kruse,et al.  Fuzzy clustering of quantitative and qualitative data , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..