Improved Variable Neighbourhood Search Heuristic for Quartet Clustering

Given a set of n data objects and their pairwise dissimilarities, the goal of quartet clustering is to construct an optimal tree from the total number of possible combinations of quartet topologies on n, where optimality means that the sum of the dissimilarities of the embedded (or consistent) quartet topologies is minimal. This corresponds to an NP-hard combinatorial optimization problem, also referred to as minimum quartet tree cost (MQTC) problem. We provide details and formulation of this challenging problem, and propose a basic greedy heuristic that is characterized by a very high speed and some interesting implementation details. The solution approach, though simple, substantially improves the performance of a Reduced Variable Neighborhood Search for the MQTC problem. The latter is one of the most popular heuristic algorithms for tackling the MQTC problem.

[1]  Jan H. M. Korst,et al.  Heuristic Approaches for the Quartet Method of Hierarchical Clustering , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Manuel Cebrián,et al.  Reducing the Loss of Information through Annealing Text Distortion , 2011, IEEE Transactions on Knowledge and Data Engineering.

[3]  G. Furnas The generation of random, binary unordered trees , 1984 .

[4]  Tao Jiang,et al.  Quartet Cleaning: Improved Algorithms and Simulations , 1999, ESA.

[5]  Nenad Mladenovic,et al.  Less is more: Basic variable neighborhood search for minimum differential dispersion problem , 2016, Inf. Sci..

[6]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[7]  Pierre Hansen,et al.  Variable Neighbourhood Search , 2003 .

[8]  Daniel Aloise,et al.  Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering , 2017, Inf. Sci..

[9]  Sergio Consoli,et al.  A VNS-based quartet algorithm for biomedical literature clustering , 2015, Electron. Notes Discret. Math..

[10]  Paul M. B. Vitányi,et al.  Author ' s personal copy A Fast Quartet tree heuristic for hierarchical clustering , 2010 .

[11]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[12]  Tao Jiang,et al.  A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application , 2001, SIAM J. Comput..

[13]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[14]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[15]  Pierre Hansen,et al.  Variable Neighborhood Search , 2018, Handbook of Heuristics.

[16]  Sergio Consoli,et al.  A quartet method based on variable neighborhood search for biomedical literature extraction and clustering , 2017, Int. Trans. Oper. Res..

[17]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[18]  Mirjana Cangalovic,et al.  Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search , 2003, Eur. J. Oper. Res..

[19]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[20]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[21]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.