论文信息 - Discovering associations in XML data

Discovering associations in XML data

Knowledge inference from semi-structured data can utilize frequent sub structures, in addition to frequency of data items.In fact, the working assumption of the present study is that frequent sub-trees of XML data represent sets of tags (objects) that aremeaningfully associated. A method for extracting frequent sub-trees from XML data is presented. It uses thresholds on frequenciesof paths and on the multiplicity of paths in the data. The frequent sub-trees are extracted and counted in a procedure that has O(n2) complexity. The data content of the extracted sub-trees, in the form of attribute values, is cast in tabular form. This enables a search forassociations in the extracted data. Thus, the complete procedure uses structure and content to extract association rules from semi-structureddata. A large industrial example is used to demonstrate the operation of the proposed method.

[1] Maguelonne Teisseire,et al. A General Architecture for Finding Structural Regularities on the Web , 2000, AIMSA.

[2] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3] Ke Wang,et al. Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[4] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5] AgrawalRakesh,et al. Mining association rules between sets of items in large databases , 1993 .

[6] Kuniaki Uehara,et al. Knowledge Integration of Rule Mining and Schema Discovering , 2000, Discovery Science.

[7] Yanchun Zhang,et al. Efficiently computing frequent tree-like topology patterns in a Web environment , 1999, Proceedings Technology of Object-Oriented Languages and Systems (Cat. No.PR00393).

[8] Ada Wai-Chee Fu,et al. Finding Structure and Characteristics of Web Documents for Classification , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[9] Hiroki Arimura,et al. Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[10] Philip S. Yu,et al. Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[11] Kuniaki Uehara,et al. Mining Association Rules from Semi-Structured Data , 2000, ICDCS Workshop of Knowledge Discovery and Data Mining in the World-Wide Web.

[12] Ke Wang,et al. Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[13] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.