We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are defined as itemsets whose supports increase significantly from one dataset to another. EPs can capture emerging trends in timestamped databases, or useful contrasts between data classes. EPs have been proven useful: we have used them to build very powerful classifiers, which are more accurate than C4.5 and CBA, for many datasets. We believe that EPs with low to medium support, such as 1%-20%, can give useful new insights and guidance to experts, in even “well understood” applications. The efficient mining of EPs is a challenging problem, since (i) the Apriori property no longer holds for EPs, and (ii) there are usually too many candidates for high dimensional databases or for small support thresholds such as 0.5%. Naive algorithms are too costly. To solve this problem, (a) we promote the description of large collections of itemsets using their concise borders (the pair of sets of the minimal and of the maximal itemsets in the collections). (b) We design EP mining algorithms which manipulate only borders of collections (especially using our multiborder-differential algorithm), and which represent discovered EPs using borders. All EPs satisfying a constraint can be efficiently discovered by our border-based algorithms, which take the borders, derived by Max-Miner, of large itemsets as inputs. In our experiments on large and high dimensional datasets including the US census and Mushroom datasets, many EPs, including some with large cardinality, are found quickly. We also give other algorithms for discovering general or special types of EPs. Permission to make digital or hard copies of all or part ol‘this work Iht personal or classroom USC is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the lirst page. To copy otherwise, to republish, to post on servers or to redistribute to lists. requires prior specific permission and/or a fee. KDD-99 San Diego CA USA Copyright ACM 1999 l-58113-143-7/99/08...$5.00 Jinyan Li Department of CSSE The University of Melbourne jyli@cs.mu.oz.au
[1]
Devika Subramanian,et al.
The Common Order-Theoretic Structure of Version Spaces and ATMSs
,
1991,
Artif. Intell..
[2]
Ron Rymon,et al.
Search through Systematic Set Enumeration
,
1992,
KR.
[3]
Ramakrishnan Srikant,et al.
Fast Algorithms for Mining Association Rules in Large Databases
,
1994,
VLDB.
[4]
R. Agarwal.
Fast Algorithms for Mining Association Rules
,
1994,
VLDB 1994.
[5]
Heikki Mannila,et al.
Discovering Frequent Episodes in Sequences
,
1995,
KDD.
[6]
Ramakrishnan Srikant,et al.
Mining sequential patterns
,
1995,
Proceedings of the Eleventh International Conference on Data Engineering.
[7]
Jiawei Han,et al.
Exploration of the power of attribute-oriented induction in data mining
,
1995,
KDD 1995.
[8]
Ramakrishnan Srikant,et al.
Discovering Trends in Text Databases
,
1997,
KDD.
[9]
Wynne Hsu,et al.
Integrating Classification and Association Rule Mining
,
1998,
KDD.
[10]
Sushil Jajodia,et al.
Mining Temporal Relationships with Multiple Granularities in Time Sequences
,
1998,
IEEE Data Eng. Bull..
[11]
Sridhar Ramaswamy,et al.
Cyclic association rules
,
1998,
Proceedings 14th International Conference on Data Engineering.
[12]
Jiawei Han,et al.
Mining Segment-Wise Periodic Patterns in Time-Related Databases
,
1998,
KDD.
[13]
Roberto J. Bayardo,et al.
Efficiently mining long patterns from databases
,
1998,
SIGMOD '98.
[14]
Jinyan Li,et al.
Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness
,
1998,
PAKDD.
[15]
Xiuzhen Zhang,et al.
Discovering Jumping Emerging Patterns and Experiments on Real Datasets
,
1999
.
[16]
Jiawei Han,et al.
Efficient mining of partial periodic patterns in time series database
,
1999,
Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).
[17]
Jinyan Li,et al.
CAEP: Classification by Aggregating Emerging Patterns
,
1999,
Discovery Science.