Data Mining for XML Query-Answering Support

Extracting information from semistructured documents is a very hard task, and is going to become more and more critical as the amount of digital information available on the Internet grows. Indeed, documents are often so large that the data set returned as answer to a query may be too big to convey interpretable knowledge. In this paper, we describe an approach based on Tree-Based Association Rules (TARs): mined rules, which provide approximate, intensional information on both the structure and the contents of Extensible Markup Language (XML) documents, and can be stored in XML format as well. This mined knowledge is later used to provide: 1) a concise idea-the gist-of both the structure and the content of the XML document and 2) quick, approximate answers to queries. In this paper, we focus on the second feature. A prototype system and experimental results demonstrate the effectiveness of the approach.

[1]  Alessandro Campi,et al.  Discovering interesting information in XML data with association rules , 2003, SAC '03.

[2]  Elisa Quintarelli,et al.  Intensional Query Answering to XQuery Expressions , 2005, DEXA.

[3]  Letizia Tanca,et al.  Mining Tree-Based Frequent Patterns from XML , 2009, FQAS.

[4]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[6]  Carlo Combi,et al.  Querying XML documents by using association rules , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[7]  J. Widom,et al.  Approximate DataGuides , 1998 .

[8]  Elena Baralis,et al.  Answering XML queries by means of data summaries , 2007, TOIS.

[9]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[10]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[11]  Hee Yong Youn,et al.  A New Method for Mining Association Rules from a Collection of XML Documents , 2005, ICCSA.

[12]  Alexandre Termier,et al.  DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm , 2008, IEEE Transactions on Knowledge and Data Engineering.

[13]  Ke Wang,et al.  Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[14]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[15]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[16]  Hans Weigand,et al.  An XML-Enabled Association Rule Framework , 2003, DEXA.

[17]  Yannis Manolopoulos,et al.  Fast mining of frequent tree structures by hashing and indexing , 2005, Inf. Softw. Technol..

[18]  Fernando Berzal Galiano,et al.  Mining Induced and Embedded Subtrees in Ordered, Unordered, and Partially-Ordered Trees , 2008, ISMIS.

[19]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[20]  Kam-Fai Wong,et al.  Answering XML Queries Using Path-Based Indexes: A Survey , 2006, World Wide Web.

[21]  John Zeleznikow,et al.  Relational computation for mining association rules from XML data , 2005, CIKM '05.

[22]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[23]  Gillian Dobbie,et al.  Extracting association rules from XML documents using XQuery , 2003, WIDM '03.

[24]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[27]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[28]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[29]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[30]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[32]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[33]  Letizia Tanca,et al.  Mining tree-based association rules from XML documents , 2009, SEBD.

[34]  Denilson Barbosa,et al.  Studying the XML Web: Gathering Statistics from an XML Sample , 2006, World Wide Web.

[35]  Zhigang Li,et al.  Efficient data mining for maximal frequent subtrees , 2003, Third IEEE International Conference on Data Mining.