Mining Tree-Based Frequent Patterns from XML

The increasing amount of very large XML datasets available to casual users is a challenging problem for our community, and calls for an appropriate support to efficiently gather knowledge from these data. Data mining, already widely applied to extract frequent correlations of values from both structured and semi-structured datasets, is the appropriate field for knowledge elicitation. In this work we describe an approach to extract Tree-based association rules from XML documents. Such rules provide approximate, intensional information on both the structure and the content of XML documents, and can be stored in XML format to be queried later on. A prototype system demonstrates the effectiveness of the approach.

[1]  Mirina Grosz,et al.  World Wide Web Consortium , 2010 .

[2]  John Zeleznikow,et al.  Relational computation for mining association rules from XML data , 2005, CIKM '05.

[3]  Carlo Combi,et al.  Querying XML documents by using association rules , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[4]  Letizia Tanca,et al.  Mining tree-based association rules from XML documents , 2009, SEBD.

[5]  Yannis Manolopoulos,et al.  Fast mining of frequent tree structures by hashing and indexing , 2005, Inf. Softw. Technol..

[6]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[7]  Elena Baralis,et al.  Answering XML queries by means of data summaries , 2007, TOIS.

[8]  David Taniar,et al.  Computational Science and Its Applications - ICCSA 2005, International Conference, Singapore, May 9-12, 2005, Proceedings, Part I , 2005, ICCSA.

[9]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[10]  Alessandro Campi,et al.  Discovering interesting information in XML data with association rules , 2003, SAC '03.

[11]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.

[12]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[14]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[16]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[17]  Gillian Dobbie,et al.  Extracting association rules from XML documents using XQuery , 2003, WIDM '03.

[18]  Shusaku Tsumoto,et al.  Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, NY, USA, May 25-28, 2005, Proceedings , 2005, ISMIS.

[19]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Alexandre Termier,et al.  DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm , 2008, IEEE Transactions on Knowledge and Data Engineering.

[22]  Fernando Berzal Galiano,et al.  Mining Induced and Embedded Subtrees in Ordered, Unordered, and Partially-Ordered Trees , 2008, ISMIS.

[23]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[24]  Hans Weigand,et al.  An XML-Enabled Association Rule Framework , 2003, DEXA.

[25]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[26]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[27]  Hee Yong Youn,et al.  A New Method for Mining Association Rules from a Collection of XML Documents , 2005, ICCSA.

[28]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .