论文信息 - Towards subjectifying text clustering

Towards subjectifying text clustering

Although it is common practice to produce only a single clustering of a dataset, in many cases text documents can be clustered along different dimensions. Unfortunately, not only do traditional text clustering algorithms fail to produce multiple clusterings of a dataset, the only clustering they produce may not be the one that the user desires. In this paper, we propose a simple active clustering algorithm that is capable of producing multiple clusterings of the same data according to user interest. In comparison to previous work on feedback-oriented clustering, the amount of user feedback required by our algorithm is minimal. In fact, the feedback turns out to be as simple as a cursory look at a list of words. Experimental results are very promising: our system is able to generate clusterings along the user-specified dimensions with reasonable accuracies on several challenging text classification tasks, thus providing suggestive evidence that our approach is viable.

Vincent Ng | Sajib Dasgupta | Vincent Ng | Sajib Dasgupta

[1] James Allan,et al. An interactive algorithm for asking and incorporating feature feedback into support vector machines , 2007, SIGIR.

[2] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3] James Allan,et al. Interactive Clustering of Text Collections According to a User-Specified Criterion , 2007, IJCAI.

[4] Philip S. Yu,et al. Text Classification by Labeling Words , 2004, AAAI.

[5] Charles A. Micchelli,et al. On Spectral Learning , 2010, J. Mach. Learn. Res..

[6] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7] Shlomo Argamon,et al. Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[8] Maria-Florina Balcan,et al. Clustering with Interactive Feedback , 2008, ALT.

[9] Rich Caruana,et al. Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[11] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[13] Arindam Banerjee,et al. Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[14] Ian Davidson,et al. Finding Alternative Clusterings Using Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[15] Vincent Ng,et al. Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification , 2009, EMNLP.

[16] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[17] Xin Liu,et al. Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[18] Inderjit S. Dhillon,et al. Simultaneous Unsupervised Learning of Disparate Clusterings , 2008, Stat. Anal. Data Min..

[19] Thomas Hofmann,et al. Non-redundant data clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20] John Blitzer,et al. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.