Unsupervised method to ensemble results of multiple clustering solutions for bibliographic data

Multiobjective optimization refers to optimization of multiple conflicting objective functions simultaneously. Clustering problem is often formulated as a multiobjective optimization problem where multiple cluster quality measures are simultaneously optimized and Pareto based approaches are popular in solving that Pareto based approaches yield a set of solutions known as Pareto front where all the solutions are non-dominated with respect to each other. A single solution is selected by the decision maker according to his/her preference. But when the number of non-dominated solutions is large in number, then it is difficult for the decision maker to choose the one solution. The selection of a solution from the given Pareto front is known as Post-Pareto optimality analysis. In the past many approaches were proposed for solving the aforementioned problem, but most of these involve the decision maker. In this paper, we have proposed an approach to obtain a single solution from a set of non-dominated solutions by combining these solutions without the intervention of the decision maker. We have evaluated our approach on the set of solutions obtained after application of a newly developed multiobjective based clustering technique on bibliographic databases like DBLR.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[4]  Heidi Taboada,et al.  A Post-Pareto Approach for Multi-Objective Decision Making Using a Non-Uniform Weight Generator Method , 2012, Complex Adaptive Systems.

[5]  Soon Cheol Park,et al.  Multi-Objective Genetic Algorithms, NSGA-II and SPEA2, for Document Clustering , 2011, FGIT-ASEA/DRBC/EL.

[6]  Pierre Borne,et al.  Pareto-optimality approach based on uniform design and fuzzy evolutionary algorithms for flexible job-shop scheduling problems (FJSPs) , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[7]  Sriparna Saha,et al.  On Validation of Clustering Techniques for Bibliographic Databases , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[9]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Sriparna Saha,et al.  GAEMTBD: Genetic algorithm based entity matching techniques for bibliographic databases , 2017, Applied Intelligence.

[11]  Xavier Blasco Ferragud,et al.  A new graphical visualization of n-dimensional Pareto front for decision-making in multiobjective optimization , 2008, Inf. Sci..

[12]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithms for Clustering - Applications in Data Mining and Bioinformatics , 2011 .

[13]  Enrico Zio,et al.  A clustering procedure for reducing the number of representative solutions in the Pareto Front of multiobjective optimization problems , 2011, Eur. J. Oper. Res..

[14]  Kalyanmoy Deb,et al.  Multi‐objective optimisation and multi‐criteria decision making in SLS using evolutionary approaches , 2011 .

[15]  Alex Alves Freitas,et al.  A critical review of multi-objective optimization in data mining: a position paper , 2004, SKDD.

[16]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[17]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[18]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[19]  Heidi A. Taboada,et al.  Applications and performance of the non-numerical ranking preferences method for post-Pareto optimality , 2011, Complex Adaptive Systems.

[20]  Divesh Srivastava,et al.  Linking temporal records , 2011, Frontiers of Computer Science.

[21]  Chunyan Miao,et al.  Author Name Disambiguation Using a New Categorical Distribution Similarity , 2012, ECML/PKDD.

[22]  Sriparna Saha,et al.  Entity Matching Technique for Bibliographic Database , 2013, DEXA.

[23]  Eckart Zitzler,et al.  Evolutionary algorithms for multiobjective optimization: methods and applications , 1999 .

[24]  Sriparna Saha,et al.  Cluster validation techniques for Bibliographic databases , 2014, Proceedings of the 2014 IEEE Students' Technology Symposium.

[25]  Sriparna Saha,et al.  An automatic framework for entity matching in bibliographic databases , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[26]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Chien-Hsing Chou,et al.  Symmetry as A new Measure for Cluster Validity , 2002 .

[28]  Enrico Zio,et al.  Optimal power system generation scheduling by multi-objective genetic algorithms with preferences , 2009, Reliab. Eng. Syst. Saf..

[29]  Sriparna Saha,et al.  A multiobjective optimization based entity matching technique for bibliographic databases , 2016, Expert Syst. Appl..

[30]  Sheldon H. Jacobson,et al.  A Post-Optimality Analysis Algorithm for Multi-Objective Optimization , 2004, Comput. Optim. Appl..

[31]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[32]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[33]  Marcos André Gonçalves,et al.  A Genetic Programming Approach to Record Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[34]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..