A Data Mining-Based OLAP Aggregation of Complex Data: Application on XML Documents

Nowadays, most organizations deal with complex data that have different formats and come from different sources. The XML formalism is evolving and becoming a promising solution for modeling and warehousing these data in decision support systems. Nevertheless, classical OLAP tools still are not capable of analyzing such data. In this article, we associate OLAP and data mining to cope with advanced analysis on complex data. We provide a generalized OLAP operator, called OpAC, based on the AHC. OpAC is adapted for all types of data, since it deals with data cubes modeled within XML. Our operator enables significant aggregates of facts expressing semantic similarities. Evaluation criteria of aggregates’ partitions are proposed in order to assist the choice of the best partition. Furthermore, we developed a Web application for our operator. We also provide performance experiments and drive a case study on XML documents dealing with the breast cancer research domain.

[1]  Qiming Chen,et al.  An OLAP-based Scalable Web Access Analysis Engine , 2000, DaWaK.

[2]  Hyoil Han,et al.  Temporal rule induction for clinical outcome analysis , 2005, Int. J. Bus. Intell. Data Min..

[3]  Lixin Fu Novel Efficient Classifiers Based on Data Cube , 2005, Int. J. Data Warehous. Min..

[4]  Xiaohui Liu,et al.  Data mining from 1994 to 2004: an application-orientated review , 2005, Int. J. Bus. Intell. Data Min..

[5]  Alok N. Choudhary,et al.  High Performance Multidimensional Analysis and Data Mining , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[6]  Bernadette Bouchon-Meunier,et al.  Fuzzy Data Mining from Multidimensional Databases , 2000 .

[7]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[8]  T. Imielinski,et al.  A database perspective on knowledge discovery : A database perspective on knowledge discovery , 1996 .

[9]  David Taniar,et al.  A Methodology for Building XML Data Warehouses , 2005, Int. J. Data Warehous. Min..

[10]  David Taniar,et al.  Mining Association Rules in Data Warehouses , 2005, Int. J. Data Warehous. Min..

[11]  Sunita Sarawagi,et al.  iDiff: Informative Summarization of Differences in Multidimensional Aggregates , 2001, Data Mining and Knowledge Discovery.

[12]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[13]  Simon Fraser MULTI-DIMENSIONAL SEQUENTIAL PATTERN MINING , 2001 .

[14]  Panos Vassiliadis,et al.  A Presentation Model & Non-Traditional Visualization for OLAP , 2005, Int. J. Data Warehous. Min..

[15]  Tharam S. Dillon,et al.  An XML-enabled data mining query language: XML-DMQL , 2005, Int. J. Bus. Intell. Data Min..

[16]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[17]  Alok N. Choudhary,et al.  PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining , 2001, J. Parallel Distributed Comput..

[18]  Il-Yeol Song,et al.  Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications , 2004, J. Database Manag..

[19]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[20]  Surajit Chaudhuri Data Mining and Database Systems: Where is the Intersection? , 1998, IEEE Data Eng. Bull..

[21]  Zohra Bellahsene,et al.  A View Model for XML Documents , 2000, OOIS.

[22]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[23]  Andreas Hotho,et al.  Enhancing Preprocessing in Data-Intensive Domains using Online-Analytical Processing , 2000, DaWaK.

[24]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[25]  Shuzo Ohe Statistical Data Processing , 1994 .

[26]  Boris Vrdoljak,et al.  Data warehouse design from XML sources , 2001, DOLAP '01.

[27]  Fabrice Muhlenbach,et al.  A statistical approach for separability of classes , 2005 .

[28]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[29]  Jaroslav Pokorný Modelling stars using XML , 2001, DOLAP '01.

[30]  Sabine Loudcher,et al.  Web multiform data structuring for warehousing , 2003 .

[31]  Alessandro Campi,et al.  Discovering interesting information in XML data with association rules , 2003, SAC '03.

[32]  Themis Palpanas,et al.  Knowledge discovery in data warehouses , 2000, SGMD.

[33]  Sabine Loudcher,et al.  A new OLAP aggregation based on the AHC technique , 2004, DOLAP '04.

[34]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[35]  Jean-Charles Lamirel,et al.  New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping , 2004, Scientometrics.

[36]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[37]  Richard H. Moore,et al.  THE DIGITAL DATABASE FOR SCREENING MAMMOGRAPHY , 2007 .

[38]  J. Wenny Rahayu,et al.  Conceptual and Systematic Design Approach for XML Document Warehouses , 2005, Int. J. Data Warehous. Min..

[39]  Zhengxin Chen,et al.  An integrated interactive environment for knowledge discovery from heterogeneous data resources , 2001, Inf. Softw. Technol..

[40]  Wolfgang Hümmer,et al.  XCube: XML for data warehouses , 2003, DOLAP '03.

[41]  Surajit Chaudhuri,et al.  Scalable classification over SQL databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[42]  R. Sokal,et al.  A New Statistical Approach to Geographic Variation Analysis , 1969 .

[43]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[44]  Jacques Robin,et al.  HYSSOP: Natural Language Generation Meets Knowledge Discovery in Databases , 2001 .

[45]  Jian Pei,et al.  Mining Multi-Dimensional Constrained Gradients in Data Cubes , 2001, VLDB.