Process of applying data mining techniques to XML data

XML has gained popularity for information representation, exchange and retrieval. As the XML material becomes more abundant, the ability to gain knowledge from XML sources decreases due to their heterogeneity and structural irregularity. The use of data mining techniques becomes essential to improve XML document handling. This paper discusses the capabilities and the process of applying data mining techniques in XML sources.

[1]  Ee-Peng Lim,et al.  DTD-Miner: a tool for mining DTD from XML documents , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[2]  Elio Masciari,et al.  Fast detection of XML structural similarity , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Korris Fu-Lai Chung,et al.  On the use of hierarchical information in sequential mining-based XML document similarity computation , 2004, Knowledge and Information Systems.

[4]  Gerhard Weikum,et al.  Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data , 2003, WebDB.

[5]  Dongho Won,et al.  EXiT-B: A New Approach for Extracting Maximal Frequent Subtrees from XML Data , 2005, IDEAL.

[6]  Richi Nayak,et al.  XCLS: A Fast and Effective Clustering Algorithm for Heterogenous XML Documents , 2006, PAKDD.

[7]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[8]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[9]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Pier Luca Lanzi,et al.  A tool for extracting XML association rules , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[12]  Tharam S. Dillon,et al.  A semantic network-based design methodology for XML documents , 2002, TOIS.

[13]  Brad Adelberg,et al.  NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents , 1998, SIGMOD '98.

[14]  Richi Nayak,et al.  XMine: A Methodology for Mining XML Structure , 2006, APWeb.

[15]  Neoklis Polyzotis,et al.  Approximate XML query answers , 2004, SIGMOD '04.

[16]  Sung-Hyon Myaeng,et al.  A flexible model for retrieval of SGML documents , 1998, SIGIR '98.

[17]  Petr Kotásek The XML Data Mining Specification Language , 2002, EDBT PhD Workshop.

[18]  Elisa Bertino,et al.  A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications , 2004, Inf. Syst..

[19]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[21]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[22]  A Min Tjoa,et al.  On Efficient and Effective Association Rule Mining from XML Data , 2004, DEXA.

[23]  Jung-Won Lee,et al.  Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns , 2004, ADVIS.

[24]  Dan Suciu,et al.  UnQL: a query language and algebra for semistructured data based on structural recursion , 2000, The VLDB Journal.

[25]  Kam-Fai Wong,et al.  Approximate Graph Schema Extraction for Semi-Structured Data , 2000, EDBT.