A very large scale neighborhood search algorithm for the q-mode problem

The q-mode problem is a combinatorial optimization problem that arises in the context of partitioning a given collection of data vectors with categorical attributes. A neighborhood search algorithm is proposed for solving the q-mode problem. This algorithm is based on a very large scale neighborhood that is implicitly searched using network flow techniques. The algorithm is evaluated through a computational experiment using randomly generated instances. The results show that in general this algorithm obtains very-good-quality local optima, and that in instances with strong natural clusters the algorithm consistently finds optimal or near-optimal solutions.

[1]  Ravindra K. Ahuja,et al.  Very large-scale neighborhood search , 2000 .

[2]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[3]  Yahya Fathi,et al.  Algorithms for the model configuration problem , 2004 .

[4]  I. Rigoutsos,et al.  The emergence of pattern discovery techniques in computational biology. , 2000, Metabolic engineering.

[5]  Abraham P. Punnen,et al.  A survey of very large-scale neighborhood search techniques , 2002, Discret. Appl. Math..

[6]  James B. Orlin,et al.  Theory of cyclic transfers , 1989 .

[7]  Dushyant Sharma,et al.  Multi-exchange neighborhood structures for the capacitated minimum spanning tree problem , 2001, Math. Program..

[8]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Abraham P. Punnen,et al.  Very Large-Scale Neighborhood Search , 2000, Handbook of Approximation Algorithms and Metaheuristics.

[11]  Ryszard S. Michalski,et al.  Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Jacques Desrosiers,et al.  Time Constrained Routing and Scheduling , 1992 .

[14]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[15]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[16]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.