A Framework for Data Structure-Guided Extraction of XML Association Rules

Because of the widespread interest and use of semi-structured data in XML format, the discovery of useful information from them is currently one of the main research topics on association rule extraction. Several encouraging approaches to developing methods for mining rules from XML data have been proposed. However, efficiency and simplicity are still barriers for further development due to the combinatorial explosion in the number of tree nodes. What is needed is a clear and simple methodology for extracting the knowledge that is hidden in the heterogeneous tree data. In this paper, we show that association rules can be unveiled and provided from any XML documents using a special data structure, called Simple and Effective Lists Structure (SELS), avoiding the computationally intractable problem in the number of nodes. SELS is flexible and powerful enough to represent both simple and complex structured association relationships in XML data.