Similarities in Fuzzy Data Mining: From a Cognitive View to Real-World Applications

Similarity is a key concept for all attempts to construct humanlike automated systems or assistants to human task solving since they are very natural in the human process of categorization, underlying many natural capabilities such as language understanding, pattern recognition or decision-making. In this paper, we study the use of similarities in data mining, basing our discourse on cognitive approaches of similarity stemming for instance from Tversky's and Rosch's seminal works, among others. We point out a general framework for measures of comparison compatible with these cognitive foundations, and we show that measures of similarity can be involved in all steps of the data mining process. We then focus on fuzzy logic that provides interesting tools for data mining mainly because of its ability to represent imperfect information, which is of crucial importance when databases are complex, large, and contain heterogeneous, imprecise, vague, uncertain or incomplete data. We eventually illustrate our discourse by examples of similarities used in real-world data mining problems.

[1]  E. Rosch,et al.  Cognition and Categorization , 1980 .

[2]  Mark T. Keane,et al.  Dynamic similarity: a processing perspective on similarity , 2001, Similarity and Categorization.

[3]  M. Shaw,et al.  Induction of fuzzy decision trees , 1995 .

[4]  Edwina L. Rissland,et al.  AI and Similarity , 2006, IEEE Intelligent Systems.

[5]  B. Baets,et al.  A comparative study of similarity measures , 1995 .

[6]  Bernadette Bouchon-Meunier,et al.  SIMILARITY AND PROTOTYPE-BASED APPROACH FOR CLASSIFICATION OF MICROCALCIFICATIONS , 1997 .

[7]  Bernadette Bouchon-Meunier,et al.  Fuzzy Prototypes Based on Typicality Degrees , 2004, Fuzzy Days.

[8]  C. Marsala Fuzzy partitioning methods , 2001 .

[9]  M. Posner,et al.  On the genesis of abstract ideas. , 1968, Journal of experimental psychology.

[10]  Janusz Kacprzyk,et al.  LINGUISTIC SUMMARIES OF DATA USING FUZZY LOGIC , 2001 .

[11]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[12]  E. Rosch,et al.  Family resemblances: Studies in the internal structure of categories , 1975, Cognitive Psychology.

[13]  U. Neisser Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization , 1989 .

[14]  Stéphane Marchand-Maillet Adaptive Multimedia Retrieval: User, Context, and Feedback, 4th International Workshop, AMR 2006, Geneva, Switzerland, July 27-28, 2006, Revised Selected Papers , 2007, Adaptive Multimedia Retrieval.

[15]  Witold Pedrycz,et al.  Granular Computing - The Emerging Paradigm , 2007 .

[16]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[17]  Heiko Timm,et al.  Differentiated Treatment of Missing Values in Fuzzy Clustering , 2003, IFSA.

[18]  Marie-Jeanne Lesot Similarity , typicality and fuzzy prototypes for numerical data , 2005 .

[19]  Bernadette Bouchon-Meunier,et al.  Ranking invariance between fuzzy similarity measures applied to image retrieval , 2004, 2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542).

[20]  Marie-Jeanne Lesot,et al.  A New Web Usage Mining and Visualization Tool , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[21]  Maria Rifqi,et al.  Mesures de comparaison, typicalite et classification d'objets flous : theorie et pratique , 1996 .

[22]  Bernadette Bouchon-Meunier,et al.  Discrimination power of measures of comparison , 2000, Fuzzy Sets Syst..

[23]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[24]  Andreas Nürnberger,et al.  Adaptive Multimedia Retrieval: From Data to User Interaction , 2005 .

[25]  U. Hahn,et al.  Similarity and categorization , 2001 .

[26]  Marcin Detyniecki,et al.  STRICT: An Image Retrieval Platform for Queries Based on Regional Content , 2004, CIVR.

[27]  Yannis A. Tolias,et al.  Generalized fuzzy indices for similarity matching , 2001, Fuzzy Sets Syst..

[28]  Maria Rifqi Constructing prototypes from large databases , 1996 .

[29]  B. Bouchon-Meunier,et al.  An adaptable system to construct fuzzy decision trees , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[30]  L. Barsalou Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories. , 1985, Journal of experimental psychology. Learning, memory, and cognition.

[31]  R. Nosofsky Similarity, frequency, and category representations. , 1988 .

[32]  Bernadette Bouchon-Meunier,et al.  Fuzzy Prototypes: From a Cognitive View to a Machine Learning Principle , 2008, Fuzzy Sets and Their Extensions: Representation, Aggregation and Models.

[33]  A. Tversky Features of Similarity , 1977 .

[34]  C. Marsala,et al.  Linguistic modifiers and measures of similarity or resemblance , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[35]  Bernadette Bouchon-Meunier,et al.  Automated identification of political conflicts with a scenario recognition technique , 2004 .

[36]  James A. Hampton,et al.  The role of similarity in natural categorization , 2001, Similarity and Categorization.

[37]  Lawrence W. Barsalou,et al.  The instability of graded structure: implications for the nature of concepts , 1987 .

[38]  Gerald Sommer,et al.  An Adaptive Classification Algorithm Using Robust Incremental Clustering , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[39]  Bernadette Bouchon-Meunier,et al.  Towards general measures of comparison of objects , 1996, Fuzzy Sets Syst..

[40]  Tae-Seong Kim,et al.  Facial Image Retrieval through Compound Queries Using Constrained Independent Component Analysis , 2007 .

[41]  Thanh Ha Dang,et al.  Using Entropy to Impute Missing Data in a Classification Task , 2007, 2007 IEEE International Fuzzy Systems Conference.

[42]  L. Valverde On the structure of F-indistinguishability operators , 1985 .

[43]  James H. Davenport,et al.  On the Integration of Algebraic Functions , 1979, Lecture Notes in Computer Science.

[44]  Eyke Hüllermeier,et al.  Fuzzy methods in machine learning and data mining: Status and prospects , 2005, Fuzzy Sets Syst..

[45]  Marie-Jeanne Lesot TYPICALITY-BASED CLUSTERING , 2006 .

[46]  Rami Zwick,et al.  Measures of similarity among fuzzy concepts: A comparative analysis , 1987, Int. J. Approx. Reason..

[47]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[48]  Ramesh C. Jain,et al.  Similarity measures for image databases , 1995, Electronic Imaging.

[49]  Bernadette Bouchon-Meunier,et al.  Improvement of the Interpretability of Fuzzy Rule Based Systems: Quantifiers, Similarities and Aggregators , 2003, Modelling with Words.

[50]  L. Wittgenstein Philosophical investigations = Philosophische Untersuchungen , 1958 .

[51]  A. Ochiai Zoogeographical Studies on the Soleoid Fishes Found in Japan and its Neighbouring Regions-III , 1957 .

[52]  Maria Rifqi,et al.  Ranking Invariance Based on Similarity Measures in Document Retrieval , 2005, Adaptive Multimedia Retrieval.

[53]  Serge Guillaume,et al.  Designing fuzzy inference systems from data: An interpretability-oriented review , 2001, IEEE Trans. Fuzzy Syst..

[54]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[55]  L. A. Zadeh,et al.  A note on prototype theory and fuzzy sets , 1982, Cognition.

[56]  Uzay Kaymak,et al.  Similarity measures in fuzzy rule base simplification , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[57]  D. Gentner,et al.  Structural Alignment during Similarity Comparisons , 1993, Cognitive Psychology.

[58]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Bernard De Baets,et al.  Fuzzy Sets and Systems — IFSA 2003 , 2003, Lecture Notes in Computer Science.

[60]  Qinbao Song,et al.  A new imputation method for small software project data sets , 2007, J. Syst. Softw..

[61]  ICHIRO KOBAYASHI,et al.  An Approach to a Dynamic System Simulation Based on Human Information Processing , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[62]  Abraham Kandel,et al.  On the Theory of Typicality , 1995, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[63]  Ulrike Hahn,et al.  Introduction: similarity and categorization , 2001, Similarity and Categorization.

[64]  Zhong-Fu Wu,et al.  The fuzzy similarity measures for content-based image retrieval , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[65]  D. Gentner,et al.  Respects for similarity , 1993 .

[66]  B. Bouchon-Meunier,et al.  OWA operators and an extension of the contrast model , 1997 .

[67]  Jens Strackeljan,et al.  Do smart adaptive systems exist? : best practice for selection and combination of intelligent methods , 2005 .

[68]  Bernadette Bouchon-Meunier,et al.  Real world fuzzy logic applications in data mining and information retrieval , 2007 .

[69]  Kenpei Shiina,et al.  A Fuzzy-set-theoretic Feature Model and Its Application to Asymmetric Similarity Data Analysis , 1988 .

[70]  Lotfi A. Zadeh,et al.  Similarity relations and fuzzy orderings , 1971, Inf. Sci..

[71]  Zhoujun Li,et al.  An Incremental Fuzzy Decision Tree Classification Method for Mining Data Streams , 2007, MLDM.

[72]  Shyi-Ming Chen,et al.  A comparison of similarity measures of fuzzy values , 1995 .

[73]  Bernadette Bouchon-Meunier,et al.  Discrimination power of measures of resemblance , 2003 .

[74]  C. Tijus,et al.  Properties, categories, and categorisation , 2005 .

[75]  Bernadette Bouchon-Meunier,et al.  Monitoring Event Flows and Modelling Scenarios for Crisis Prediction: Application to Ethnic Conflicts Forecasting , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[76]  C.J.H. Mann Similarity and Compatibility in Fuzzy Set Theory – Assessment and Applications , 2002 .