A Survey of Text Clustering Algorithms
暂无分享,去创建一个
[1] C. J. van Rijsbergen,et al. The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..
[2] W. Bruce Croft,et al. Document clustering: An evaluation of some experiments with the cranfield 1400 collection , 1975, Inf. Process. Manag..
[3] W. Bruce Croft. Clustering large files of documents using the single-link method , 1977, J. Am. Soc. Inf. Sci..
[4] Peter Willett,et al. Document clustering using an inverted file approach , 1980 .
[5] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .
[6] Fionn Murtagh,et al. A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..
[7] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..
[8] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .
[9] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[10] Pat Langley,et al. Models of Incremental Concept Formation , 1990, Artif. Intell..
[11] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[12] Peter J. Rousseeuw,et al. Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .
[13] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.
[14] W. John Wilbur,et al. The automatic identification of stop words , 1992, J. Inf. Sci..
[15] David R. Karger,et al. Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.
[16] David R. Karger,et al. Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.
[17] Jiawei Han,et al. Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.
[18] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.
[19] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.
[20] Yiming Yang,et al. Noise reduction in a statistical approach to text categorization , 1995, SIGIR '95.
[21] Chris Buckley,et al. Pivoted Document Length Normalization , 1996, SIGIR Forum.
[22] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.
[23] Hang Li,et al. Document Classification Using a Finite Mixture Model , 1997, ACL.
[24] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .
[25] Oren Etzioni,et al. Fast and Intuitive Clustering of Web Documents , 1997, KDD.
[26] Jan O. Pedersen,et al. Almost-constant-time clustering of arbitrary corpus subsets4 , 1997, SIGIR '97.
[27] Hinrich Schütze,et al. Projections for efficient document clustering , 1997, SIGIR '97.
[28] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[29] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.
[30] Shivakumar Vaithyanathan,et al. Exploiting clustering and phrases for context-based information retrieval , 1997, SIGIR '97.
[31] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.
[32] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.
[33] Sebastian Thrun,et al. Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.
[34] Dimitrios Gunopulos,et al. Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.
[35] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.
[36] Andrew McCallum,et al. Distributional clustering of words for text classification , 1998, SIGIR '98.
[37] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.
[38] Sudipto Guha,et al. ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).
[39] Philip S. Yu,et al. Fast algorithms for projected clustering , 1999, SIGMOD '99.
[40] Sang-goo Lee,et al. A semi-supervised document clustering technique for information organization , 2000, CIKM '00.
[41] Naftali Tishby,et al. Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.
[42] Jon M. Kleinberg,et al. Clustering categorical data: an approach based on dynamical systems , 2000, The VLDB Journal.
[43] Philip S. Yu,et al. Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD 2000.
[44] Philip S. Yu,et al. Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.
[45] Huan Liu,et al. Feature Selection for Clustering , 2000, Encyclopedia of Database Systems.
[46] Thomas de Quincey. [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.
[47] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .
[48] Sharad Mehrotra,et al. Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.
[49] Ran El-Yaniv,et al. On feature distributional clustering for text categorization , 2001, SIGIR '01.
[50] Martin Franz,et al. Unsupervised and supervised clustering for topic tracking , 2001, SIGIR '01.
[51] Ran El-Yaniv,et al. Iterative Double Clustering for Unsupervised and Semi-supervised Learning , 2001, ECML.
[52] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.
[53] Philip S. Yu,et al. On effective conceptual indexing and similarity search in text data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.
[54] Naftali Tishby,et al. Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.
[55] George Karypis,et al. Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.
[56] I. Jolliffe. Principal Component Analysis , 2002 .
[57] Chris H. Q. Ding,et al. Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[58] Patrick Pantel,et al. Document clustering with committees , 2002, SIGIR '02.
[59] Martin Ester,et al. Frequent term-based text clustering , 2002, KDD.
[60] Arindam Banerjee,et al. Semi-supervised Clustering by Seeding , 2002, ICML.
[61] Xin Liu,et al. Document clustering based on non-negative matrix factorization , 2003, SIGIR.
[62] Wei-Ying Ma,et al. An Evaluation on Feature Selection for Text Clustering , 2003, ICML.
[63] Dominic Widdows,et al. Discovering Corpus-Specific Word Senses , 2003, EACL.
[64] Inderjit S. Dhillon,et al. Information-theoretic co-clustering , 2003, KDD '03.
[65] Ata Kabán,et al. On an equivalence between PLSI and LDA , 2003, SIGIR.
[66] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[67] Raymond J. Mooney,et al. A probabilistic framework for semi-supervised clustering , 2004, KDD.
[68] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.
[69] Yihong Gong,et al. Document clustering by concept factorization , 2004, SIGIR '04.
[70] Tao Li,et al. Document clustering via adaptive subspace iteration , 2004, SIGIR '04.
[71] Jon M. Kleinberg,et al. Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.
[72] George Karypis,et al. Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.
[73] Philip S. Yu,et al. On using partial supervision for text categorization , 2004, IEEE Transactions on Knowledge and Data Engineering.
[74] Tom Michael Mitchell,et al. The Role of Unlabeled Data in Supervised Learning , 2004 .
[75] Tao Tao,et al. A formal study of information retrieval heuristics , 2004, SIGIR '04.
[76] Renée J. Miller,et al. LIMBO: Scalable Clustering of Categorical Data , 2004, EDBT.
[77] Yiming Yang,et al. A Probabilistic Model for Online Document Clustering with Application to Novelty Detection , 2004, NIPS.
[78] Hector Garcia-Molina,et al. Overview of multidatabase transaction management , 2005, The VLDB Journal.
[79] Shi Zhong,et al. Efficient streaming text clustering , 2005, Neural Networks.
[80] Chris H. Q. Ding,et al. On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.
[81] Philip S. Yu,et al. Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.
[82] Douglas H. Fisher,et al. Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.
[83] Farshad Fotouhi,et al. Co-clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning , 2006, Sixth International Conference on Data Mining (ICDM'06).
[84] Naftali Tishby,et al. The Power of Word Clusters for Text Classification , 2006 .
[85] Stefan Siersdorfer,et al. A neighborhood-based approach for clustering of linked document collections , 2006, CIKM '06.
[86] Mehran Sahami,et al. A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.
[87] Ramayya Krishnan,et al. Incremental hierarchical clustering of text documents , 2006, CIKM '06.
[88] Philip S. Yu,et al. A Framework for Clustering Massive Text and Categorical Data Streams , 2006, SDM.
[89] Xiang Ji,et al. Document clustering with prior knowledge , 2006, SIGIR.
[90] John D. Lafferty,et al. Dynamic topic models , 2006, ICML.
[91] Tom M. Mitchell,et al. Text clustering with extended user feedback , 2006, SIGIR.
[92] Fei Wang,et al. Regularized clustering for documents , 2007, SIGIR.
[93] Maxime Crochemore,et al. Algorithms on strings , 2007 .
[94] Susan T. Dumais,et al. Similarity Measures for Short Segments of Text , 2007, ECIR.
[95] ChengXiang Zhai,et al. Statistical Language Models for Information Retrieval , 2008, NAACL.
[96] Qi He,et al. Bursty Feature Representation for Clustering Text Streams , 2007, SDM.
[97] Xiaohua Hu,et al. A comparative evaluation of different link types on enhancing document clustering , 2008, SIGIR '08.
[98] Chris H. Q. Ding,et al. Knowledge transformation from word space to document space , 2008, SIGIR '08.
[99] Deng Cai,et al. Topic modeling with network regularization , 2008, WWW.
[100] Jian Yin,et al. Clustering Text Data Streams , 2008, Journal of Computer Science and Technology.
[101] Weimao Ke,et al. Dynamicity vs. effectiveness: studying online clustering for scatter/gather , 2009, SIGIR.
[102] Yizhou Sun,et al. iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.
[103] Hong Cheng,et al. Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..
[104] Andrew McCallum,et al. Efficient methods for topic model inference on streaming document collections , 2009, KDD.
[105] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.
[106] Yun Chi,et al. Combining link and content for community detection: a discriminative approach , 2009, KDD.
[107] Kai Wang,et al. Prototype hierarchy based clustering for the categorization and navigation of web collections , 2010, SIGIR.
[108] Yue Lu,et al. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA , 2011, Information Retrieval.
[109] Ricardo Baeza-Yates,et al. Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .
[110] Dan Zhang,et al. Document clustering with universum , 2011, SIGIR.
[111] Philip S. Yu,et al. On Text Clustering with Side Information , 2012, 2012 IEEE 28th International Conference on Data Engineering.
[112] Charu C. Aggarwal,et al. Community Detection with Edge Content in Social Media Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.