Data classification through an evolutionary approach based on multiple criteria

Real-world problems usually present a huge volume of imprecise data. These types of problems may challenge case-based reasoning systems because the knowledge extracted from data is used to identify analogies and solve new problems. Many authors have focused on organizing case memory in patterns to minimize the computational burden and deal with uncertainty. The organization is usually determined by a single criterion, but in some problems, a single criterion can be insufficient to find accurate clusters. This work describes an approach to organize the case memory in patterns based on multiple criteria. This new approach uses the searching capabilities of multiobjective evolutionary algorithms to build a Pareto set of solutions, where each one is a possible organization based on the relevance of objectives. The system shows promising capabilities when it is compared with a successful system based on self-organizing maps. Due to the data set geometry influences, the clustering building process results are analyzed taking into account it. For this reason, some complexity measures are used to categorize data sets according to their topology.

[1]  M. Malek,et al.  A Preprocessing Model for Integrating CBR and Prototype-Based Neural Networks , 1994 .

[2]  C. A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Juan M. Corchado,et al.  IBR retrieval method based on topology preserving mappings , 2004, J. Exp. Theor. Artif. Intell..

[5]  Tin Kam Ho,et al.  Data Complexity in Pattern Recognition (Advanced Information and Knowledge Processing) , 2006 .

[6]  Albert Fornells,et al.  A Methodology for Analyzing Case Retrieval from a Clustered Case Memory , 2007, ICCBR.

[7]  Bernard Ženko,et al.  Learning Predictive Clustering Rules , 2005, Informatica.

[8]  Alvaro Garcia-Piquer,et al.  Analysis of vulnerability assessment results based on CAOS , 2011, Appl. Soft Comput..

[9]  Enric Plaza,et al.  A Reflective Architecture for Integrated Memory-Based Learning and Reasoning , 1993, EWCBR.

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Xavier Llorà,et al.  XCS and GALE: A Comparative Study of Two Learning Classifier Systems on Data Mining , 2001, IWLCS.

[12]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[14]  Stefan Wess,et al.  Using k-d Trees to Improve the Retrieval Step in Case-Based Reasoning , 1993, EWCBR.

[15]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[16]  E. R. Bareiss,et al.  PROTOS: An Experiment in Knowledge Acquisition for Heuristic ClassificationTasks , 1986 .

[17]  Mario Lenz,et al.  Applying Case Retrieval Nets to Diagnostic Tasks in Technical Domains , 1996, EWCBR.

[18]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[19]  Edwina L. Rissland,et al.  Case Retrieval through Multiple Indexing and Heuristic Search , 1993, IJCAI.

[20]  Agnar Aamodt,et al.  CASE-BASED REASONING: FOUNDATIONAL ISSUES, METHODOLOGICAL VARIATIONS, AND SYSTEM APPROACHES AICOM - ARTIFICIAL INTELLIGENCE COMMUNICATIONS , 1994 .

[21]  Michael D. Brown A Memory Model for Case Retrieval by Activation Passing , 1994 .

[22]  Pei-Chann Chang,et al.  A hybrid system combining self-organizing maps with case-based reasoning in wholesaler's new-release book forecasting , 2005, Expert Syst. Appl..

[23]  Nic Wilson,et al.  Decision Diagrams: Fast and Flexible Support for Case Retrieval and Recommendation , 2006, ECCBR.

[24]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[25]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[26]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[28]  Jing Wu,et al.  Enhancing the Effectiveness of Interactive Case-Based Reasoning with Clustering and Decision Forests , 2004, Applied Intelligence.

[29]  Albert Fornells,et al.  Patterns Out of Cases Using Kohonen Maps in Breast Cancer Diagnosis , 2008, Int. J. Neural Syst..

[30]  Erick Cantú-Paz,et al.  Efficient and Accurate Parallel Genetic Algorithms , 2000, Genetic Algorithms and Evolutionary Computation.

[31]  Peter J. Fleming,et al.  An Overview of Evolutionary Algorithms in Multiobjective Optimization , 1995, Evolutionary Computation.

[32]  Elisabet Golobardes,et al.  An Unsupervised Learning Approach for Case-Based Classifier Systems , 2003 .

[33]  Daniel Merkle,et al.  Bi-Criterion Optimization with Multi Colony Ant Algorithms , 2001, EMO.

[34]  Ireneusz Czarnowski Cluster-based instance selection for machine classification , 2010, Knowledge and Information Systems.

[35]  Guojun Gan,et al.  Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability) , 2007 .

[36]  Isabelle Bichindaritz,et al.  MEMORY ORGANIZATION AS THE MISSING LINK BETWEEN CASE‐BASED REASONING AND INFORMATION RETRIEVAL IN BIOMEDICINE , 2006, Comput. Intell..

[37]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[38]  Carlos A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[39]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Filip De Turck,et al.  Hybrid reasoning technique for improving context-aware applications , 2012, Knowledge and Information Systems.

[41]  Anil K. Jain,et al.  Multiobjective data clustering , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[42]  T. Ho,et al.  Data Complexity in Pattern Recognition , 2006 .

[43]  Albert Fornells,et al.  Measuring the Applicability of Self-organization Maps in a Case-Based Reasoning System , 2007, IbPRIA.

[44]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[45]  Jörg Walter Schaaf,et al.  "Fish and Sink" - An Anytime-Algorithm to Retrieve Adequate Cases , 1995, ICCBR.

[46]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[47]  Sanghamitra Bandyopadhyay,et al.  A new multiobjective clustering technique based on the concepts of stability and symmetry , 2010, Knowledge and Information Systems.

[48]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[49]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[50]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.