Adaptive Hierarchical Clustering Using Ordinal Queries

In many applications of clustering (for example, ontologies or clusterings of animal or plant species), hierarchical clusterings are more descriptive than a flat clustering. A hierarchical clustering over n elements is represented by a rooted binary tree with n leaves, each corresponding to one element. The subtrees rooted at interior nodes capture the clusters. In this paper, we study active learning of a hierarchical clustering using only ordinal queries. An ordinal query consists of a set of three elements, and the response to a query reveals the two elements (among the three elements in the query) which are "closer" to each other than to the third one. We say that elements x and x' are closer to each other than x" if there exists a cluster containing x and x', but not x". When all the query responses are correct, there is a deterministic algorithm that learns the underlying hierarchical clustering using at most n log2 n adaptive ordinal queries. We generalize this algorithm to be robust in a model in which each query response is correct independently with probability [Equation], and adversarially incorrect with probability 1 − p. We show that in the presence of noise, our algorithm outputs the correct hierarchical clustering with probability at least 1 − Δ, using O(n log n + n log(1/Δ)) adaptive ordinal queries. For our results, adaptivity is crucial: we prove that even in the absence of noise, every non-adaptive algorithm requires Ω(n3) ordinal queries in the worst case.

[1]  C. Jordan Sur les assemblages de lignes. , 1869 .

[2]  Jakub Truszkowski,et al.  Fast Algorithms for Large-Scale Phylogenetic Reconstruction , 2013 .

[3]  Elliot Anshelevich,et al.  Approximating Optimal Social Choice under Metric Preferences , 2015, AAAI.

[4]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[5]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[6]  Elliot Anshelevich,et al.  Ordinal approximation in matching and social choice , 2016, SECO.

[7]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[8]  Ravi Kumar,et al.  On the Relevance of Irrelevant Alternatives , 2016, WWW.

[9]  Ashish Goel,et al.  Large-Scale Decision-Making via Small Group Interactions : the Importance of Triads 1 , 2014 .

[10]  Eugene L. Lawler,et al.  Determining the Evolutionary Tree Using Experiments , 1996, J. Algorithms.

[11]  Robert D. Nowak,et al.  Active Ranking using Pairwise Comparisons , 2011, NIPS.

[12]  Sivaraman Balakrishnan,et al.  Efficient Active Algorithms for Hierarchical Clustering , 2012, ICML.

[13]  Sanjoy Dasgupta,et al.  Interactive Bayesian Hierarchical Clustering , 2016, ICML.

[14]  Robert D. Nowak,et al.  Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities , 2011, AISTATS.

[15]  Ming-Yang Kao,et al.  Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability , 2008, Algorithms for Molecular Biology.

[16]  Yair Bartal,et al.  On approximating arbitrary metrices by tree metrics , 1998, STOC '98.

[17]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[18]  Avinatan Hassidim,et al.  The Bayesian Learner is Optimal for Noisy Binary Search  (and Pretty Good for Quantum as Well) , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[19]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[20]  Daniel G. Brown,et al.  Towards a Practical O(n logn) Phylogeny Algorithm , 2011, WABI.

[21]  Gert R. G. Lanckriet,et al.  Learning Multi-modal Similarity , 2010, J. Mach. Learn. Res..

[22]  Richard M. Karp,et al.  Noisy binary search and its applications , 2007, SODA '07.

[23]  Robert D. Nowak,et al.  Low-dimensional embedding using adaptively selected ordinal data , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Pranjal Awasthi,et al.  Supervised Clustering , 2010, NIPS.

[25]  Eli Upfal,et al.  Computing with Noisy Information , 1994, SIAM J. Comput..

[26]  Ulrike von Luxburg,et al.  Uniqueness of Ordinal Embedding , 2014, COLT.

[27]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[28]  Shachar Lovett,et al.  Active Classification with Comparison Queries , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[29]  D. McFadden,et al.  AN APPLICATION OF DIAGNOSTIC TESTS FOR THE INDEPENDENCE FROM IRRELEVANT ALTERNATIVES PROPERTY OF THE MULTINOMIAL LOGIT MODEL , 1977 .

[30]  Kilian Q. Weinberger,et al.  Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[31]  Maria-Florina Balcan,et al.  Local algorithms for interactive clustering , 2013, ICML.

[32]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[33]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[34]  Anna Pagh,et al.  The Complexity of Constructing Evolutionary Trees Using Experiments , 2001, ICALP.

[35]  Sanjoy Dasgupta,et al.  A cost function for similarity-based hierarchical clustering , 2015, STOC.

[36]  Judea Pearl,et al.  Structuring causal trees , 1986, J. Complex..

[37]  Krzysztof Onak,et al.  Generalization of Binary Search: Searching in Trees and Forest-Like Partial Orders , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[38]  Sagi Snir,et al.  Fast and reliable reconstruction of phylogenetic trees with very short edges , 2008, SODA '08.

[39]  Mark Braverman,et al.  Noisy sorting without resampling , 2007, SODA '08.

[40]  Maria-Florina Balcan,et al.  Clustering with Interactive Feedback , 2008, ALT.

[41]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[42]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[43]  Lalit Jain,et al.  Finite Sample Prediction and Recovery Bounds for Ordinal Embedding , 2016, NIPS.

[44]  Krzysztof Onak,et al.  Finding an optimal tree searching strategy in linear time , 2008, SODA '08.

[45]  David J. Kriegman,et al.  Generalized Non-metric Multidimensional Scaling , 2007, AISTATS.

[46]  Daniel G. Brown,et al.  Fast error-tolerant quartet phylogeny algorithms , 2010, Theor. Comput. Sci..

[47]  Alfred V. Aho,et al.  Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions , 1981, SIAM J. Comput..

[48]  David Kempe,et al.  Deterministic and probabilistic binary search in graphs , 2015, STOC.