A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes

Real-world data collections are often heterogeneous (represented by a set of mixed attributes data types: numerical, categorical and fuzzy); since most available similarity measures can only be applied to one type of data, it becomes essential to construct an appropriate similarity measure for comparing such complex data. In this paper, a framework of new and unified similarity measures is proposed for comparing heterogeneous objects described by numerical, categorical and fuzzy attributes. Examples are used to illustrate, compare and discuss the applications and efficiency of the proposed approach to heterogeneous data comparison and clustering.

[1]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[2]  Zeshui Xu,et al.  Clustering algorithm for intuitionistic fuzzy sets , 2008, Inf. Sci..

[3]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  C.J.H. Mann Similarity and Compatibility in Fuzzy Set Theory – Assessment and Applications , 2002 .

[5]  Lipika Dey,et al.  A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set , 2007, Pattern Recognit. Lett..

[6]  Giovanni Acampora,et al.  A hybrid evolutionary approach for solving the ontology alignment problem , 2012, Int. J. Intell. Syst..

[7]  Guy De Tré,et al.  A Hierarchical Approach to Object Comparison , 2007, IFSA.

[8]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[9]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[10]  Igor Kononenko,et al.  Machine Learning and Data Mining: Introduction to Principles and Algorithms , 2007 .

[11]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[12]  Vipin Kumar,et al.  A Framework for Exploring Categorical Data , 2009, SDM.

[13]  RahimiAli,et al.  Similarity-based Classification: Concepts and Algorithms , 2009 .

[14]  Daniel Sánchez,et al.  Complex object comparison in a fuzzy context , 2003, Inf. Softw. Technol..

[15]  R. Sokal,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification. , 1975 .

[16]  Nicolás Marín,et al.  A General Framework for Computing with Words in Object-Oriented Programming , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  D. W. Goodall A New Similarity Index Based on Probability , 1966 .

[18]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[19]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[20]  B. Buckles,et al.  Modelling class hierarchies in the fuzzy object-oriented data model , 1993 .

[21]  Adnan Yazici,et al.  IFOOD: An Intelligent Fuzzy Object-Oriented Database Architecture , 2003, IEEE Trans. Knowl. Data Eng..

[22]  Boleslaw K. Szymanski,et al.  Learning Dissimilarities for Categorical Symbols , 2010, FSDM.

[23]  Lotfi A. Zadeh,et al.  Fuzzy logic = computing with words , 1996, IEEE Trans. Fuzzy Syst..

[24]  Mick J. Ridley,et al.  A New Approach for Comparing Fuzzy Objects , 2010, IPMU.

[25]  Marie-Jeanne Lesot,et al.  Similarity measures for binary and numerical data: a survey , 2008, Int. J. Knowl. Eng. Soft Data Paradigms.

[26]  Jonathan Lee,et al.  A note on current approaches to extending fuzzy logic to object‐oriented modeling , 2001, Int. J. Intell. Syst..

[27]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[28]  Fernando C. Lourenço,et al.  Binary-based similarity measures for categorical data and their application in Self- Organizing Maps , 2004 .

[29]  C. Tappert,et al.  A Survey of Binary Similarity and Distance Measures , 2010 .

[30]  J Romai,et al.  EXTENDING OBJECT-ORIENTED DATABASES FOR FUZZY INFORMATION MODELING , 2006 .

[31]  Marie-Jeanne Lesot Similarity , typicality and fuzzy prototypes for numerical data , 2005 .

[32]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[33]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[34]  JonesK. Sparck,et al.  A probabilistic model of information retrieval , 2000 .

[35]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[36]  Witold Pedrycz,et al.  Foundations of Fuzzy Logic and Soft Computing, 12th International Fuzzy Systems Association World Congress, IFSA 2007, Cancun, Mexico, June 18-21, 2007, Proceedings , 2007, IFSA.

[37]  Lotfi A. Zadeh,et al.  From Computing with Numbers to Computing with Words - from Manipulation of Measurements to Manipulation of Perceptions , 2005, Logic, Thought and Action.

[38]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[39]  Henri Prade,et al.  Generalizing Database Relational Algebra for the Treatment of Incomplete/Uncertain Information and Vague Queries , 1984, Inf. Sci..

[40]  David G. Stork,et al.  Pattern Classification , 1973 .

[41]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[42]  Igor Kononenko,et al.  Machine Learning and Data Mining: Introduction to Principles and Algorithms , 2007 .

[43]  Zongmin Ma,et al.  Advances In Fuzzy Object-oriented Databases: Modeling And Applications , 2004 .

[44]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[45]  Nicolás Marín,et al.  A Framework to Build Fuzzy Object-Oriented Capabilities Over an Existing Database System , 2005 .

[46]  Rami Zwick,et al.  Measures of similarity among fuzzy concepts: A comparative analysis , 1987, Int. J. Approx. Reason..

[47]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[48]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.