FTMnodes: Fuzzy tree mining based on partial inclusion

Mining frequent patterns from huge databases have been addressed for many years and results have been applied to many fields, including banking, marketing, biology, health, etc. Fuzzy approaches have been proposed in order to soften the constraints on the patterns found by the algorithms. However, when dealing with complex databases such as tree databases (as it is for instance the case for XML databases), only a few methods have been proposed in order to handle soft constraints in discovering the frequent subtrees from a forest of trees. Such algorithms can hardly deal with real data in a soft manner. Indeed, they consider a subtree as fully included in the super-tree, meaning that all the nodes must appear. In this paper, we extend this definition to fuzzy inclusion based on the idea that a tree is included to a certain degree within another one. This fuzzy degree being correlated to the number of matching nodes. We propose the FTMnodes method together with the associated definitions, and we report the experiments lead on synthetical and real databases, showing the interest of our approach.

[1]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[2]  Chen Wang,et al.  Chopper: Efficient algorithm for tree mining , 2004, Journal of Computer Science and Technology.

[3]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  Céline Fiot,et al.  From Crispness to Fuzziness: Three Algorithms for Soft Sequential Pattern Mining , 2007, IEEE Transactions on Fuzzy Systems.

[5]  Anne Laurent,et al.  Fuzzy Tree Mining: Go Soft on Your Nodes , 2007, IFSA.

[6]  Anne Laurent,et al.  FuzBT: a Binary Approach for Fuzzy Tree Mining , 2006 .

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Mark Allen Weiss,et al.  Data structures and algorithm analysis in C , 1991 .

[9]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  R. Yager Families of OWA operators , 1993 .

[11]  Anne Laurent,et al.  Fuzzy data mining for the semantic web: Building XML mediator schemas , 2006, Fuzzy Logic and the Semantic Web.

[12]  Anne Laurent,et al.  RSF - A New Tree Mining Approach with an Ecient Data Structure , 2005, EUSFLAT Conf..

[13]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[14]  Elie Sanchez,et al.  Fuzzy Logic and the Semantic Web , 2005 .

[15]  Arimura Hiroki,et al.  Efficient Substructure Discovery from Large Semi-structured Data , 2001 .

[16]  Chieh-Li Chen,et al.  Analysis and design of fuzzy control system , 1993 .

[17]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[18]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.