Discover dependency pattern among attributes by using a new type of nonlinear multiregression

Multiregression is one of the most common approaches used to discover dependency pattern among attributes in a database. Nonadditive set functions have been applied to deal with the interactive predictive attributes involved, and some nonlinear integrals with respect to nonadditive set functions are employed to establish a nonlinear multiregression model describing the relation between the objective attribute and predictive attributes. The values of the nonadditive set function play a role of unknown regression coefficients in the model and are determined by an adaptive genetic algorithm from the data of predictive and objective attributes. Furthermore, such a model is now improved by a new numericalization technique such that the model can accommodate both categorical and continuous numerical attributes. The traditional dummy binary method dealing with the mixed type data can be regarded as a very special case of our model when there is no interaction among the predictive attributes and the Choquet integral is used. When running the algorithm, to avoid a premature during the evolutionary procedure, a technique of maintaining diversity in the population is adopted. A test example shows that the algorithm and the relevant program have a good reversibility for the data. © 2001 John Wiley & Sons, Inc.16: 949–962 (2001)

[1]  George J. Klir,et al.  Using genetic algorithms to determine nonnegative monotone set functions for information fusion in environments with random perturbation , 1999, Int. J. Intell. Syst..

[2]  Kwong-Sak Leung,et al.  A new nonlinear integral used for information fusion , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[5]  Kwong-Sak Leung,et al.  Using generalized Choquet integral in projection pursuit based classification , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[6]  Hongjun Lu,et al.  Discovering and Reconciling Semantic Conflicts: A Data Mining Perspective , 1997, DS-7.

[7]  M. Sugeno,et al.  Multi-attribute classification using fuzzy integral , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.

[8]  M. Sugeno,et al.  Non-monotonic fuzzy measures and the Choquet integral , 1994 .

[9]  George J. Klir,et al.  PFB-Integrals and PFA-Integrals with Respect to Monotone Set Functions , 1997, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  G. Klir,et al.  Fuzzy Measure Theory , 1993 .

[11]  Kwong-Sak Leung,et al.  A new type of nonlinear integrals and the computational algorithm , 2000, Fuzzy Sets Syst..

[12]  George J. Klir,et al.  Choquet integrals and natural extensions of lower probabilities , 1997, Int. J. Approx. Reason..

[13]  George J. Klir,et al.  Genetic algorithms for determining fuzzy measures from data , 1998, J. Intell. Fuzzy Syst..

[14]  George J. Klir,et al.  PAN-INTEGRALS WITH RESPECT TO IMPRECISE PROBABILITIES , 1996 .

[15]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[16]  Michael P. Wellman,et al.  Bayesian networks , 1995, CACM.

[17]  Samir W. Mahfoud Crowding and Preselection Revisited , 1992, PPSN.

[18]  Kwong-Sak Leung,et al.  Using a new type of nonlinear integral for multi-regression: an application of evolutionary algorithms in data mining , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[19]  Kwong-Sak Leung,et al.  A genetic algorithm for determining nonadditive set functions in information fusion , 1999, Fuzzy Sets Syst..

[20]  Michel Grabisch,et al.  Classification by fuzzy integral: performance and tests , 1994, CVPR 1994.

[21]  Kwong-Sak Leung,et al.  Nonlinear nonnegative multiregressions based on Choquet integrals , 2000, Int. J. Approx. Reason..