Mixture Modeling with Pairwise, Instance-Level Class Constraints

The goal of semisupervised clustering/mixture modeling is to learn the underlying groups comprising a given data set when there is also some form of instance-level supervision available, usually in the form of labels or pairwise sample constraints. Most prior work with constraints assumes the number of classes is known, with each learned cluster assumed to be a class and, hence, subject to the given class constraints. When the number of classes is unknown or when the one-cluster-per-class assumption is not valid, the use of constraints may actually be deleterious to learning the ground-truth data groups. We address this by (1) allowing allocation of multiple mixture components to individual classes and (2) estimating both the number of components and the number of classes. We also address new class discovery, with components void of constraints treated as putative unknown classes. For both real-world and synthetic data, our method is shown to accurately estimate the number of classes and to give favorable comparison with the recent approach of Shental, Bar-Hillel, Hertz, and Weinshall (2003).

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  David J. Miller,et al.  A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[4]  Yasutoshi Yajima,et al.  Optimization approaches for semi-supervised learning , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[5]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[8]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[9]  Claire Cardie,et al.  Intelligent Clustering with Instance-Level Constraints , 2002 .

[10]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[11]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[12]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[14]  David G. Stork Toward a Computational Theory of Data Acquisition and Truthing , 2001, COLT/EuroCOLT.

[15]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[17]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[18]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[19]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[20]  K. Bennett,et al.  Optimization Approaches to Semi-Supervised Learning , 2001 .

[21]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[22]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[25]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[26]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[27]  R. Mooney,et al.  Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering , 2003 .

[28]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[29]  Anil K. Jain,et al.  Clustering with Soft and Group Constraints , 2004, SSPR/SPR.

[30]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[31]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[32]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[33]  AharonBar-Hillel TomerHertz,et al.  Learning via Equivalence Constraints , with applications to the Enhancement of Image and Video Retrieval , 2002 .