Mining MOUCLAS Patterns and Jumping MOUCLAS Patterns to Construct Classifiers

This paper proposes a mining novel approach which consists of two new data mining algorithms for the classification over quantitative data, based on two new pattern called MOUCLAS (MOUntain function based CLASsification) Patterns and JumpingMOUCLAS Patterns. The motivation of the study is to develop two classifiers for quantitative attributes by the concepts of the association rule and the clustering. An illustration of using petroleum well logging data for oil/gas formation identification is presented in the paper. MPsandJMPs are ideally suitable to derive the implicit relationship between measured values (well logging data) and properties to be predicted (oil/gas formation or not). As a hybrid of classification and clustering and association rules mining, our approach have several advantages which are (1) it has a solid mathematical foundation and compact mathematical description of classifiers, (2) it does not require discretization, (3) it is robust when handling noisy or incomplete data in high dimensional data space.

[1]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[4]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[5]  Dimitar Filev,et al.  Generation of Fuzzy Rules by Mountain Clustering , 1994, J. Intell. Fuzzy Syst..

[6]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[7]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[8]  D. Madigan,et al.  Proceedings : KDD-99 : the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 15-18, 1999, San Diego, California, USA , 1999 .

[9]  Kotagiri Ramamohanarao,et al.  Making Use of the Most Expressive Jumping Emerging Patterns for Classification , 2000, Knowledge and Information Systems.

[10]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[11]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[12]  Nagwa M. El-Makky,et al.  A note on "beyond market baskets: generalizing association rules to correlations" , 2000, SKDD.

[13]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[16]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[17]  Dimitris Meretakis,et al.  Extending naïve Bayes classifiers using long itemsets , 1999, KDD '99.

[18]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[19]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[20]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[21]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[22]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[23]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[24]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.