RP-Miner: a relaxed prune algorithm for frequent similar pattern mining

Most of the current algorithms for mining frequent patterns assume that two object subdescriptions are similar if they are equal, but in many real-world problems some other ways to evaluate the similarity are used. Recently, three algorithms (ObjectMiner, STreeDC-Miner and STreeNDC-Miner) for mining frequent patterns allowing similarity functions different from the equality have been proposed. For searching frequent patterns, ObjectMiner and STreeDC-Miner use a pruning property called Downward Closure property, which should be held by the similarity function. For similarity functions that do not meet this property, the STreeNDC-Miner algorithm was proposed. However, for searching frequent patterns, this algorithm explores all subsets of features, which could be very expensive. In this work, we propose a frequent similar pattern mining algorithm for similarity functions that do not meet the Downward Closure property, which is faster than STreeNDC-Miner and loses fewer frequent similar patterns than ObjectMiner and STreeDC-Miner. Also we show the quality of the set of frequent similar patterns computed by our algorithm with respect to the quality of the set of frequent similar patterns computed by the other algorithms, in a supervised classification context.

[1]  Leticia Vega-Alvarado,et al.  A mathematical function to evaluate surgical complexity of cleft lip and palate , 2009, Comput. Methods Programs Biomed..

[2]  L. Vega-Alvarado,et al.  A similarity function to evaluate the orthodontic condition in patients with cleft lip and palate. , 2004, Medical hypotheses.

[3]  Jiuyong Li,et al.  Efficient discovery of risk patterns in medical data , 2009, Artif. Intell. Medicine.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[6]  Kevin Y. Yip,et al.  Mining periodic patterns with gap requirement from sequences , 2007 .

[7]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[8]  Xiaojun Wan,et al.  Beyond topical similarity: a structural similarity measure for retrieving highly similar documents , 2008, Knowledge and Information Systems.

[9]  Shengrui Wang,et al.  A general measure of similarity for categorical sequences , 2009, Knowledge and Information Systems.

[10]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[11]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Rafael Berlanga Llavori,et al.  Objectminer: A New Approach for Mining Complex Objects , 2016, ICEIS.

[14]  José Francisco Martínez Trinidad,et al.  Mining Frequent Similar Patterns on Mixed Data , 2008, CIARP.

[15]  Francisco-Javier Lopez,et al.  Fuzzy association rules for biological data analysis: A case study on yeast , 2008, BMC Bioinformatics.

[16]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[17]  Li Xiong,et al.  Frequent pattern mining for kernel trace data , 2008, SAC '08.

[18]  William Kwok-Wai Cheung,et al.  Learning element similarity matrix for semi-structured document analysis , 2008, Knowledge and Information Systems.

[19]  Gang Liu,et al.  Short text similarity based on probabilistic topics , 2009, Knowledge and Information Systems.

[20]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[21]  José Francisco Martínez Trinidad,et al.  Structuralization of universes , 2000, Fuzzy Sets Syst..

[22]  István Vajk,et al.  Frequent Pattern Mining in Web Log Data , 2006 .

[23]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[24]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.