Mining Class Contrast Functions by Gene Expression Programming

Finding functions whose accuracies change significantly between two classes is an interesting work. In this paper, this kind of functions is defined as class contrast functions. As Gene Expression Programming (GEP) can discover essential relations from data and express them mathematically, it is desirable to apply GEP to mining such class contrast functions from data. The main contributions of this paper include: (1) proposing a new data mining task --- class contrast function mining, (2) designing a GEP based method to find class contrast functions, (3) presenting several strategies for finding multiple class contrast functions in data, (4) giving an extensive performance study on both synthetic and real world datasets. The experimental results show that the proposed methods are effective. Several class contrast functions are discovered from the real world datasets. Some potential works on class contrast function mining are discussed based on the experimental results.

[1]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[2]  Kotagiri Ramamohanarao,et al.  Instance-Based Classification by Emerging Patterns , 2000, PKDD.

[3]  Jinyan Li,et al.  Mining border descriptions of emerging patterns from dataset pairs , 2005, Knowledge and Information Systems.

[4]  Weimin Xiao,et al.  Evolving accurate and compact classification rules with gene expression programming , 2003, IEEE Trans. Evol. Comput..

[5]  James Bailey,et al.  Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams , 2006, KDD '06.

[6]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[7]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[8]  Cândida Ferreira,et al.  Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Studies in Computational Intelligence) , 2006 .

[9]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[10]  Kotagiri Ramamohanarao,et al.  An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification , 2002, PAKDD.

[11]  Heitor Silvério Lopes,et al.  EGIPSYS: AN ENHANCED GENE EXPRESSION PROGRAMMING APPROACH FOR SYMBOLIC REGRESSION PROBLEMS † , 2004 .

[12]  Changjie Tang,et al.  Distance Guided Classification with Gene Expression Programming , 2006, ADMA.

[13]  Cândida Ferreira,et al.  Mutation, Transposition, and Recombination: An Analysis of the Evolutionary Dynamics , 2002, JCIS.

[14]  Kotagiri Ramamohanarao,et al.  DeEPs: A New Instance-Based Lazy Discovery and Classification System , 2004, Machine Learning.

[15]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[16]  Candida Ferreira Gene expression programming , 2006 .

[17]  James Bailey,et al.  Fast Algorithms for Mining Emerging Patterns , 2002, PKDD.

[18]  Kotagiri Ramamohanarao,et al.  Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets , 2000, KDD '00.

[19]  Changjie Tang,et al.  Time Series Prediction Based on Gene Expression Programming , 2004, WAIM.

[20]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[21]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[22]  James Bailey,et al.  A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns , 2003, Third IEEE International Conference on Data Mining.

[23]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[24]  Cândida Ferreira,et al.  Discovery of the Boolean Functions to the Best Density-Classification Rules Using Gene Expression Programming , 2002, EuroGP.

[25]  Cândida Ferreira,et al.  Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence , 2014, Studies in Computational Intelligence.

[26]  Wei Jiang,et al.  Data Mining Methods and Applications , 2006 .