Learning from multiple data sets with different missing attributes and privacy policies: Parallel distributed fuzzy genetics-based machine learning approach

This paper discusses parallel distributed genetics-based machine learning (GBML) of fuzzy rule-based classifiers from multiple data sets. We assume that each data set has a similar but different set of attributes. In other words, each data set has different missing attributes. Our task is the design of a fuzzy rule-based classifier from those data sets. In this paper, we first show that fuzzy rules can handle missing attributes easily. Next we explain how parallel distributed fuzzy GBML can handle multiple data sets with different missing attributes. Then we examine the accuracy of obtained fuzzy rule-based classifiers from various settings of available training data such as a single data set with no missing attribute and multiple data sets with many missing attributes. Experimental results show that the use of multiple data sets often increases the accuracy of obtained fuzzy rule-based classifiers even when they have missing attributes. We also discuss the learning from a data set under a severe privacy preserving policy where only the error rate of each candidate classifier is available. It is assumed that no information about each individual pattern is available. This means that we cannot use any information on the class label or the attribute values of each pattern. We explain how such a black-box data set can be utilized for classifier design.

[1]  Enrique Alba,et al.  Parallelism and evolutionary algorithms , 2002, IEEE Trans. Evol. Comput..

[2]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[3]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[4]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[5]  Francisco Herrera,et al.  Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study , 2010, IEEE Transactions on Evolutionary Computation.

[6]  Jaume Bacardit,et al.  GAssist vs. BioHEL: critical assessment of two paradigms of genetics-based machine learning , 2013, Soft Comput..

[7]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[8]  Antonio J. Rivera,et al.  GP-COACH: Genetic Programming-based learning of COmpact and ACcurate fuzzy rule-based classification systems for High-dimensional problems , 2010, Inf. Sci..

[9]  Francisco Herrera,et al.  IVTURS: A Linguistic Fuzzy Rule-Based Classification System Based On a New Interval-Valued Fuzzy Reasoning Method With Tuning and Rule Selection , 2013, IEEE Transactions on Fuzzy Systems.

[10]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[11]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Hisao Ishibuchi,et al.  Parallel Distributed Hybrid Fuzzy GBML Models With Rule Set Migration and Training Data Rotation , 2013, IEEE Transactions on Fuzzy Systems.

[13]  Hisao Ishibuchi,et al.  Parallel distributed genetic fuzzy rule selection , 2008, Soft Comput..

[14]  Hisao Ishibuchi,et al.  Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning , 2007, Int. J. Approx. Reason..

[15]  H. Ishibuchi,et al.  Distributed representation of fuzzy rules and its application to pattern classification , 1992 .

[16]  Francisco Herrera,et al.  Genetic fuzzy systems: taxonomy, current research trends and prospects , 2008, Evol. Intell..

[17]  Hisao Ishibuchi,et al.  Rule weight specification in fuzzy rule-based classification systems , 2005, IEEE Transactions on Fuzzy Systems.

[18]  Jaume Bacardit,et al.  Performance and Efficiency of Memetic Pittsburgh Learning Classifier Systems , 2009, Evolutionary Computation.

[19]  El-Ghazali Talbi,et al.  ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics , 2004, J. Heuristics.

[20]  Jesús Alcalá-Fdez,et al.  A Fuzzy Association Rule-Based Classification Model for High-Dimensional Problems With Genetic Rule Selection and Lateral Tuning , 2011, IEEE Transactions on Fuzzy Systems.

[21]  Hisao Ishibuchi,et al.  Hybridization of fuzzy GBML approaches for pattern classification problems , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Francisco Herrera,et al.  A Review of the Application of Multiobjective Evolutionary Fuzzy Systems: Current Status and Further Directions , 2013, IEEE Transactions on Fuzzy Systems.

[23]  Hisao Ishibuchi,et al.  Effect of rule weights in fuzzy rule-based classification systems , 2001, IEEE Trans. Fuzzy Syst..