Discovering associations in XML data

Knowledge inference from semi-structured data can utilize frequent sub structures, in addition to frequency of data items.In fact, the working assumption of the present study is that frequent sub-trees of XML data represent sets of tags (objects) that aremeaningfully associated. A method for extracting frequent sub-trees from XML data is presented. It uses thresholds on frequenciesof paths and on the multiplicity of paths in the data. The frequent sub-trees are extracted and counted in a procedure that has O(n2) complexity. The data content of the extracted sub-trees, in the form of attribute values, is cast in tabular form. This enables a search forassociations in the extracted data. Thus, the complete procedure uses structure and content to extract association rules from semi-structureddata. A large industrial example is used to demonstrate the operation of the proposed method.

[1]  Maguelonne Teisseire,et al.  A General Architecture for Finding Structural Regularities on the Web , 2000, AIMSA.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Ke Wang,et al.  Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[6]  Kuniaki Uehara,et al.  Knowledge Integration of Rule Mining and Schema Discovering , 2000, Discovery Science.

[7]  Yanchun Zhang,et al.  Efficiently computing frequent tree-like topology patterns in a Web environment , 1999, Proceedings Technology of Object-Oriented Languages and Systems (Cat. No.PR00393).

[8]  Ada Wai-Chee Fu,et al.  Finding Structure and Characteristics of Web Documents for Classification , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[9]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[10]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[11]  Kuniaki Uehara,et al.  Mining Association Rules from Semi-Structured Data , 2000, ICDCS Workshop of Knowledge Discovery and Data Mining in the World-Wide Web.

[12]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.