Efficiently Finding Conceptual Clustering Models with Integer Linear Programming

Conceptual clustering combines two long-standing machine learning tasks: the unsupervised grouping of similar instances and their description by symbolic concepts. In this paper, we decouple the problems of finding descriptions and forming clusters by first mining formal concepts (i.e. closed itemsets), and searching for the best k clusters that can be described with those itemsets. Most existing approaches performing the two steps separately are of a heuristic nature and produce results of varying quality. Instead, we address the problem of finding an optimal constrained conceptual clustering by using integer linear programming techniques. Most other generic approaches for this problem tend to have problems scaling. Our approach takes advantageous of both techniques, the general framework of integer linear programming, and high-speed specialized approaches of data mining. Experiments performed on UCI datasets show that our approach efficiently finds clusterings of consistently high quality.

[1]  Patrice Boizumault,et al.  Constraint Programming for Mining n-ary Patterns , 2010, CP.

[2]  Thi-Bich-Hanh Dao,et al.  A Declarative Framework for Constrained Clustering , 2013, ECML/PKDD.

[3]  Ruggero G. Pensa,et al.  A Bi-clustering Framework for Categorical Data , 2005, PKDD.

[4]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[5]  M. Schader,et al.  New Approaches in Classification and Data Analysis , 1994 .

[6]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[7]  Patrice Boizumault,et al.  A constraint language for declarative pattern discovery , 2012, SAC '12.

[8]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[9]  Hiroki Arimura,et al.  An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases , 2004, Discovery Science.

[10]  Pierre Hansen,et al.  An improved column generation algorithm for minimum sum-of-squares clustering , 2009, Math. Program..

[11]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[12]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[13]  Luc De Raedt,et al.  k-Pattern Set Mining under Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Bart Goethals,et al.  Tiling Databases , 2004, Discovery Science.

[15]  Oren Etzioni,et al.  Adaptive Web Sites: Conceptual Cluster Mining , 1999, IJCAI.

[16]  Tias Guns,et al.  Constrained Clustering Using Column Generation , 2014, CPAIOR.

[17]  Stefan Kramer,et al.  Integer Linear Programming Models for Constrained Clustering , 2010, Discovery Science.

[18]  S. Griffis EDITOR , 1997, Journal of Navigation.

[19]  Pierre Hansen,et al.  Partitioning Problems in Cluster Analysis: A Review of Mathematical Programming Approaches , 1994 .

[20]  P. Langley,et al.  Concept formation in structured domains , 1991 .