Similarity measures in formal concept analysis

Formal concept analysis (FCA) has been applied successively in diverse fields such as data mining, conceptual modeling, social networks, software engineering, and the semantic web. One shortcoming of FCA, however, is the large number of concepts that typically arise in dense datasets hindering typical tasks such as rule generation and visualization. To overcome this shortcoming, it is important to develop formalisms and methods to segment, categorize and cluster formal concepts. The first step in achieving these aims is to define suitable similarity and dissimilarity measures of formal concepts. In this paper we propose three similarity measures based on existent set-based measures in addition to developing the completely novel zeros-induced measure. Moreover, we formally prove that all the measures proposed are indeed similarity measures and investigate the computational complexity of computing them. Finally, an extensive empirical evaluation on real-world data is presented in which the utility and character of each similarity measure is tested and evaluated.

[1]  Mohammed J. Zaki,et al.  Theoretical Foundations of Association Rules , 2007 .

[2]  Radim Belohlávek,et al.  Similarity relations in concept lattices , 2000, J. Log. Comput..

[3]  Ruggero G. Pensa,et al.  Clustering Formal Concepts to Discover Biologically Relevant Knowledge from Gene Expression Data , 2007, Silico Biol..

[4]  John L. Pfaltz Representing Numeric Values in Concept Lattices , 2007, CLA.

[5]  Radim Belohlávek,et al.  Fast Factorization of Concept Lattices by Similarity: Solution and an Open Problem , 2004, CLA.

[6]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[7]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[8]  Raj Bhatnagar,et al.  Discovering Substantial Distinctions among Incremental Bi-Clusters , 2009, SDM.

[9]  Ryutaro Ichise Evaluation of Similarity Measures for Ontology Mapping , 2008, JSAI.

[10]  Fausto Giunchiglia,et al.  S-Match: an Algorithm and an Implementation of Semantic Matching , 2004, ESWS.

[11]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Jinyan Li,et al.  Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Anne Berry,et al.  A local approach to concept generation , 2007, Annals of Mathematics and Artificial Intelligence.

[14]  Paolo Tonella,et al.  Formal concept analysis in software engineering , 2004, Proceedings. 26th International Conference on Software Engineering.

[15]  Anna Formica,et al.  Concept similarity in Formal Concept Analysis: An information content approach , 2008, Knowl. Based Syst..

[16]  Václav Snásel,et al.  Understanding Social Networks Using Formal Concept Analysis , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[17]  Uta Priss,et al.  Formal concept analysis in information science , 2006, Annu. Rev. Inf. Sci. Technol..

[18]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[19]  Michel C. A. Klein,et al.  The semantic web: yet another hip? , 2002, Data Knowl. Eng..