Quantum annealing for combinatorial clustering

Clustering is a powerful machine learning technique that groups “similar” data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver “qbsolv.” The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Susan M. Mudambi Branding Importance in Business-to-Business Markets: Three Buyer Clusters , 2002 .

[3]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[4]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[5]  E. Tosatti,et al.  Optimization using quantum mechanics: quantum annealing through adiabatic evolution , 2006 .

[6]  Sergio M. Savaresi,et al.  On the performance of bisecting K-means and PDDP , 2001, SDM.

[7]  Daniel A. Lidar,et al.  Adiabatic quantum computation , 2016, 1611.04471.

[8]  K. Bouleimen,et al.  A new efficient simulated annealing algorithm for the resource-constrained project scheduling problem and its multiple mode version , 2003, Eur. J. Oper. Res..

[9]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[10]  Kit Yan Chan,et al.  Market segmentation and ideal point identification for new product design using fuzzy data compression and fuzzy clustering methods , 2012, Appl. Soft Comput..

[11]  Christos H. Papadimitriou,et al.  The Euclidean Traveling Salesman Problem is NP-Complete , 1977, Theor. Comput. Sci..

[12]  Marian B. Gorzalczany,et al.  Gene expression data clustering using tree-like SOMs with evolving splitting-merging structures , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[13]  L. Ingber Very fast simulated re-annealing , 1989 .

[14]  Hiroshi Ishikawa,et al.  Transformation of General Binary MRF Minimization to the First-Order Case , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Michael Kim,et al.  Developing Quantum Annealer Driven Data Discovery , 2016, ArXiv.

[17]  Sriparna Saha,et al.  Gene expression data classification using automatic differential evolution based algorithm , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[18]  Khaled S. Al-Sultan,et al.  Computational experience on four algorithms for the hard clustering problem , 1996, Pattern Recognit. Lett..

[19]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[20]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[21]  Seth Lloyd,et al.  Quantum machine learning , 2016, Nature.

[22]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[23]  Aidan Roy,et al.  A practical heuristic for finding graph minors , 2014, ArXiv.

[24]  Ryan Babbush,et al.  What is the Computational Value of Finite Range Tunneling , 2015, 1512.02206.

[25]  Shokri Z. Selim,et al.  A simulated annealing algorithm for the clustering problem , 1991, Pattern Recognit..

[26]  B. Jaumard,et al.  Efficient algorithms for divisive hierarchical clustering with the diameter criterion , 1990 .

[27]  Mira Ayadi,et al.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value , 2013, PLoS medicine.

[28]  D. Lambert,et al.  Segmentation of Markets Based on Customer Service , 1990 .

[29]  D. Mitra,et al.  Convergence and finite-time behavior of simulated annealing , 1985, 1985 24th IEEE Conference on Decision and Control.

[30]  Carlos Martín-Vide,et al.  Theory and Practice of Natural Computing , 2014, Lecture Notes in Computer Science.

[31]  Kenichi Kurihara,et al.  Quantum Annealing for Clustering , 2009, UAI.

[32]  Vasil S. Denchev Binary classification with adiabatic quantum optimization , 2013 .

[33]  Manuele Bicego,et al.  A Quantum Annealing Approach to Biclustering , 2016, TPNC.

[34]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[35]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[36]  H. Szu Fast simulated annealing , 1987 .

[37]  V. Fock,et al.  Beweis des Adiabatensatzes , 1928 .

[38]  Rakesh Chandra Balabantaray,et al.  Document Clustering using K-Means and K-Medoids , 2015, ArXiv.

[39]  Hartmut Neven,et al.  NIPS 2009 Demonstration: Binary Classification using Hardware Implementation of Quantum Annealing , 2009 .

[40]  Hiroshi Nakagawa,et al.  Quantum annealing for Dirichlet process mixture models with applications to network clustering , 2013, Neurocomputing.

[41]  Pengtao Xie,et al.  Integrating Document Clustering and Topic Modeling , 2013, UAI.

[42]  Óscar Promio Muñoz Quantum Annealing in the transverse Ising Model , 2018 .