Adaptive XML Tree Classification on Evolving Data Streams

We propose a new method to classify patterns, using closed and maximal frequent patterns as features. Generally, classification requires a previous mapping from the patterns to classify to vectors of features, and frequent patterns have been used as features in the past. Closed patterns maintain the same information as frequent patterns using less space and maximal patterns maintain approximate information. We use them to reduce the number of classification features. We present a new framework for XML tree stream classification. For the first component of our classification framework, we use closed tree mining algorithms for evolving data streams. For the second component, we use state of the art classification methods for data streams. To the best of our knowledge this is the first work on tree classification in streaming data varying with time. We give a first experimental evaluation of the proposed classification method.

[1]  Mukesh K. Mohania,et al.  Advances in Databases: Concepts, Systems and Applications , 2007 .

[2]  Ricard Gavaldà,et al.  Mining adaptively frequent closed unlabeled rooted trees in data streams , 2008, KDD.

[3]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[4]  Baihua Zheng,et al.  CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data , 2007, DASFAA.

[5]  Mohammed J. Zaki,et al.  LOGML: Log Markup Language for Web Usage Mining , 2001, WEBKDD.

[6]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[7]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[8]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[9]  José L. Balcázar,et al.  Mining frequent closed rooted trees , 2009, Machine Learning.

[10]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[11]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[12]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[13]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[14]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[15]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[16]  Yuji Matsumoto,et al.  A Boosting Algorithm for Classification of Semi-Structured Text , 2004, EMNLP.

[17]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2006, PKDD.

[18]  Hiroki Arimura,et al.  An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees , 2005, ILP.

[19]  José L. Balcázar,et al.  Mining Implications from Lattices of Closed Trees , 2008, EGC.

[20]  Alexandre Termier,et al.  DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm , 2008, IEEE Transactions on Knowledge and Data Engineering.

[21]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[22]  Jian Pei,et al.  Minimum Description Length Principle: Generators Are Preferable to Closed Patterns , 2006, AAAI.