Semi-supervised Clustering: Learning with Limited User Feedback

ii Chapter

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[4]  M. Goldszmidt,et al.  A Probabilistic Approach to Full-Text Document Clustering , 1998 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[8]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[9]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[10]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[11]  Claire Cardie,et al.  Intelligent Clustering with Instance-Level Constraints , 2002 .

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[16]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[17]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[18]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[19]  Olga Veksler,et al.  Markov random fields with efficient approximations , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[20]  David Chiu,et al.  BOOK REVIEW: "PATTERN CLASSIFICATION", R. O. DUDA, P. E. HART and D. G. STORK, Second Edition , 2001 .

[21]  Ming-Syan Chen,et al.  On the Techniques for Data Clustering with Numerical Constraints , 2003, SDM.

[22]  Inderjit S. Dhillon,et al.  Generative model-based clustering of directional data , 2003, KDD '03.

[23]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[24]  R. Mooney,et al.  Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases , 2002 .

[25]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[26]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[27]  R. Mooney,et al.  Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering , 2003 .

[28]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[29]  R. Mooney,et al.  Learnable similarity functions and their application to record linkage and clustering , 2006 .

[30]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[31]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 2002, JACM.

[33]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[37]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[38]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[39]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[40]  Samuel Kaski,et al.  Clustering Based on Conditional Distributions in an Auxiliary Space , 2002, Neural Computation.

[41]  Philip M. Long,et al.  Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[42]  Paul S. Bradley,et al.  Clustering via Concave Minimization , 1996, NIPS.

[43]  Marina MeWi Comparing Clusterings , 2002 .

[44]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[45]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[46]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[47]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[48]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[49]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[50]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[51]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[52]  Joydeep Ghosh,et al.  A S alable Approa h to Balan ed, High-dimensional Clustering of Market-baskets , 2000 .

[53]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[54]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[55]  Michael Randolph Garey,et al.  The complexity of the generalized Lloyd - Max problem , 1982, IEEE Trans. Inf. Theory.

[56]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[57]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[58]  Joachim M. Buhmann,et al.  Active Data Clustering , 1997, NIPS.

[59]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[60]  David J. Miller,et al.  A mixture model and EM algorithm for robust classification, outlier rejection, and class discovery , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[61]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[62]  Dana Ron,et al.  On Finding Large Conjunctive Clusters , 2003, COLT.

[63]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[64]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[65]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[66]  Greg Hamerly,et al.  Learning the � in �-means , .

[67]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[68]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[69]  Tom M. Mitchell,et al.  Using unlabeled data to improve text classification , 2001 .

[70]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[71]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[72]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[73]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[74]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[75]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[76]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[77]  Ran El-Yaniv,et al.  A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles , 2004, Machine Learning.

[78]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[79]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[80]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[81]  Tomer Hertz,et al.  Learning Distance Functions with Product Space Boosting , 2004 .

[82]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[83]  D. Eisenberg,et al.  Localizing proteins in the cell from their phylogenetic profiles. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[84]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[85]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[86]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[87]  Dan Klein,et al.  Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach , 2002, ICML.

[88]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[89]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[90]  C. Greg Plaxton,et al.  The Online Median Problem , 1999, SIAM J. Comput..

[91]  Joydeep Ghosh,et al.  On Scaling Up Balanced Clustering Algorithms , 2002, SDM.