Document clustering algorithms, representations and evaluation for information retrieval
暂无分享,去创建一个
[1] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.
[2] Akhil Kumar. G-Tree: A New Data Structure for Organizing Multidimensional Data , 1994, IEEE Trans. Knowl. Data Eng..
[3] Masayasu Atsumi. Attention-Guided Organized Perception and Learning of Object Categories Based on Probabilistic Latent Variable Models , 2013 .
[4] C. J. van Rijsbergen,et al. Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.
[5] Yuen Ren Chao,et al. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .
[6] James P. Callan,et al. Document allocation policies for selective searching of distributed indexes , 2010, CIKM '10.
[7] Gerhard Weikum,et al. WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .
[8] Sreenivas Gollapudi,et al. Indexing strategies for graceful degradation of search quality , 2011, SIGIR.
[9] Shlomo Geva,et al. Pairwise similarity of TopSig document signatures , 2012, ADCS.
[10] Xiaoli Li,et al. Eliminating noisy information in Web pages for data mining , 2003, KDD '03.
[11] Hector Garcia-Molina,et al. Clustering the tagged web , 2009, WSDM '09.
[12] Sergio Greco,et al. Toward Semantic XML Clustering , 2006, SDM.
[13] Alistair Moffat,et al. Against recall: is it persistence, cardinality, density, coverage, or totality? , 2009, SIGF.
[14] Adrian E. Raftery,et al. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..
[15] Boris Chidlovskii. Multi-label Wikipedia Classification with Textual and Link Features , 2009, INEX.
[16] Rudolf Bayer,et al. Organization and maintenance of large ordered indexes , 1972, Acta Informatica.
[17] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[18] G. Karypis,et al. Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .
[19] K. Sparck Jones,et al. A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS , 1973 .
[20] Azucena Montes Rendón,et al. An Iterative Clustering Method for the XML-Mining Task of the INEX 2010 , 2010, INEX.
[22] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .
[23] Richard C. Dubes,et al. Experiments in projection and clustering by simulated annealing , 1989, Pattern Recognit..
[24] Falk Scholer,et al. User performance versus precision measures for simple search tasks , 2006, SIGIR.
[25] Takeo Kanade,et al. Finding natural clusters having minimum description length , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.
[26] Key-Sun Choi,et al. Re-ranking model based on document clusters , 2001, Inf. Process. Manag..
[27] Feng Liang,et al. PKU at INEX 2010 XML Mining Track , 2010, INEX.
[28] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..
[29] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..
[30] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .
[31] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.
[32] Edward A. Fox,et al. Research Contributions , 2014 .
[33] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..
[34] James P. Callan,et al. Collection selection and results merging with topically organized U.S. patents and TREC data , 2000, CIKM '00.
[35] Dik Lun Lee,et al. Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.
[36] Alan Wee-Chung Liew,et al. Fuzzy image clustering incorporating spatial continuity , 2000 .
[37] Bruce R. Schatz,et al. Document clustering using small world communities , 2007, JCDL '07.
[38] Boris Chidlovskii,et al. Semi-supervised Categorization of Wikipedia Collection by Label Expansion , 2009, INEX.
[39] K. Rose. Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.
[40] Steffen Staab,et al. Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.
[41] David D. Lewis,et al. Representation and Learning in Information Retrieval , 1991 .
[42] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..
[43] Duncan J. Watts,et al. Collective dynamics of ‘small-world’ networks , 1998, Nature.
[44] A. Zimek,et al. On Using Class-Labels in Evaluation of Clusterings , 2010 .
[45] Albert,et al. Emergence of scaling in random networks , 1999, Science.
[46] Jim Woodcock,et al. Using Z - specification, refinement, and proof , 1996, Prentice Hall international series in computer science.
[47] Charles L. A. Clarke,et al. Improving document clustering using Okapi BM25 feature weighting , 2011, Information Retrieval.
[48] Christos Faloutsos,et al. Signature files: an access method for documents and its analytical performance evaluation , 1984, TOIS.
[49] Cyril Cleverdon,et al. The Cranfield tests on index language devices , 1997 .
[50] Richi Nayak,et al. Data Mining and XML Documents , 2002, International Conference on Internet Computing.
[51] Ah-Hwee Tan,et al. Text Mining: The state of the art and the challenges , 2000 .
[52] E. Voorhees. The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .
[53] Wolfgang Nejdl,et al. Exploiting Distribution Skew for Scalable P2P Text Clustering , 2008, DBISP2P.
[54] Kyo Kageura,et al. Implicit ambiguity resolution using incremental clustering in cross-language information retrieval , 2004, Inf. Process. Manag..
[55] Frank M. Shipman,et al. Adaptive clustering and interactive visualizations to support the selection of video clips , 2011, ICMR '11.
[56] Aidong Zhang,et al. WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.
[57] Gene H. Golub,et al. Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.
[58] Victoria J. Hodge,et al. A hardware-accelerated novel IR system , 2002, Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing.
[59] C. J. van Rijsbergen,et al. Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .
[60] Charles L. A. Clarke,et al. Effective measures for inter-document similarity , 2013, CIKM.
[61] Jiawei Han,et al. CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..
[62] Yoram Singer,et al. Context-sensitive learning methods for text categorization , 1996, SIGIR '96.
[63] Charu C. Aggarwal,et al. An Introduction to Cluster Analysis , 2018, Data Clustering: Algorithms and Applications.
[64] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.
[65] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[66] Richi Nayak,et al. Clustering XML Documents Using Frequent Subtrees , 2008, INEX.
[67] Inderjit S. Dhillon,et al. Information-theoretic co-clustering , 2003, KDD '03.
[68] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.
[69] Chih-Jen Lin,et al. Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.
[70] Christopher J. Fox,et al. A stop list for general text , 1989, SIGF.
[71] R. Bayer,et al. Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.
[72] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .
[73] Gjergji Kasneci,et al. YAWN: A Semantically Annotated Wikipedia XML Corpus , 2007, BTW.
[74] Ran Jin,et al. Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment , 2013, Journal of Cloud Computing: Advances, Systems and Applications.
[75] Ludovic Denoyer,et al. Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents , 2007, SIGF.
[76] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..
[77] Alan F. Smeaton,et al. Multilingual and Multimodal Information Access Evaluation, International Conference of the Cross-Language Evaluation Forum, CLEF 2010, Padua, Italy, September 20-23, 2010. Proceedings , 2010, CLEF.
[78] Benno Stein,et al. The optimum clustering framework: implementing the cluster hypothesis , 2011, Information Retrieval.
[79] Patrick F. Reidy. An Introduction to Latent Semantic Analysis , 2009 .
[80] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.
[81] Kaspar Riesen,et al. Graph Embedding in Vector Spaces by Means of Prototype Selection , 2007, GbRPR.
[82] Charles L. A. Clarke,et al. Overview of the TREC 2011 Web Track , 2011, TREC.
[83] James Allan,et al. A New Measure of the Cluster Hypothesis , 2009, ICTIR.
[84] Richi Nayak,et al. HCX: an efficient hybrid clustering approach for XML documents , 2009, DocEng '09.
[85] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .
[86] C. J. van Rijsbergen,et al. The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..
[87] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..
[88] Jianwu Yang,et al. Extended VSM for XML Document Classification Using Frequent Subtrees , 2009, INEX.
[89] Geoffrey E. Hinton,et al. Distributed representations and nested compositional structure , 1994 .
[90] Andrew Trotman,et al. Overview of the INEX 2010 Ad Hoc Track , 2010, INEX.
[91] Beng Chin Ooi,et al. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.
[92] Santosh S. Vempala,et al. A divide-and-merge methodology for clustering , 2005, PODS '05.
[93] Fabrizio Silvestri,et al. Query-driven document partitioning and collection selection , 2006, InfoScale '06.
[94] CHENGXIANG ZHAI,et al. A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.
[95] Ellen M. Vdorhees,et al. The cluster hypothesis revisited , 1985, SIGIR '85.
[96] Mingwei Leng,et al. An Efficient K-means Clustering Algorithm Based on Influence Factors , 2007, Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007).
[97] Charles L. A. Clarke,et al. Overview of the TREC 2010 Web Track , 2010, TREC.
[98] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.
[99] Fazli Can,et al. Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases , 1990, TODS.
[100] R. Ladner. Entropy-constrained Vector Quantization , 2000 .
[101] Christophe Moulin,et al. UJM at INEX 2009 XML Mining Track , 2009, INEX.
[102] Robert Villa,et al. The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..
[103] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.
[104] Andrew Trotman,et al. Compressing Inverted Files , 2004, Information Retrieval.
[105] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.
[106] Masayasu Atsumi. Visual Categorization Based on Learning Contextual Probabilistic Latent Component Tree , 2012, ICANN.
[107] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .
[108] Yun Chi,et al. Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.
[109] Boon-Lock Yeo,et al. Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..
[110] Anton Leuski,et al. Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.
[111] Yoshua Bengio,et al. Convergence Properties of the K-Means Algorithms , 1994, NIPS.
[112] Gene H. Golub,et al. Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.
[113] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..
[114] Christophe Moulin,et al. UJM at INEX 2008 XML Mining Track , 2008, INEX.
[115] Marcos M. Campos,et al. O-Cluster: scalable clustering of large high dimensional data sets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[116] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.
[117] Aoying Zhou,et al. An adaptive and dynamic dimensionality reduction method for high-dimensional indexing , 2007, The VLDB Journal.
[118] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .
[119] Bodo Manthey,et al. k-Means Has Polynomial Smoothed Complexity , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.
[120] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.
[121] Moran Feldman,et al. On the Impact of Random Index-Partitioning on Index Compression , 2011, ArXiv.
[122] Liu Rui,et al. Fuzzy c-Means Clustering Algorithm , 2008 .
[123] Ophir Frieder,et al. Exploiting parallelism to support scalable hierarchical clustering , 2007, J. Assoc. Inf. Sci. Technol..
[124] Philip S. Yu,et al. Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.
[125] P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .
[126] Pavel Berkhin,et al. A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.
[127] Gabriella Kazai. INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.
[128] Andrew Trotman,et al. Comparative Evaluation of Focused Retrieval , 2010, Lecture Notes in Computer Science.
[129] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[130] J. Bezdek,et al. FCM: The fuzzy c-means clustering algorithm , 1984 .
[131] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.
[132] Douglas Comer,et al. Ubiquitous B-Tree , 1979, CSUR.
[133] R. Alhajj,et al. Achieving Natural Clustering by Validating Results of Iterative Evolutionary Clustering Approach , 2006, 2006 3rd International IEEE Conference Intelligent Systems.
[134] Thomas Hofmann,et al. Probabilistic latent semantic indexing , 1999, SIGIR '99.
[135] V Latora,et al. Efficient behavior of small-world networks. , 2001, Physical review letters.
[136] Shengli Wu,et al. Testing the cluster hypothesis in distributed information retrieval , 2006, Inf. Process. Manag..
[137] Robert M. Losee,et al. Are two document clusters better than one? The Cluster Performance Question for information retrieval , 2005, J. Assoc. Inf. Sci. Technol..
[138] Edwin R. Hancock,et al. Spectral embedding of graphs , 2003, Pattern Recognit..
[139] Ellen M. Voorhees,et al. The Philosophy of Information Retrieval Evaluation , 2001, CLEF.
[140] Alessandra Lumini,et al. MKL-tree: an index structure for high-dimensional vector spaces , 2007, Multimedia Systems.
[141] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.
[142] Bing Zhou,et al. PARCLE: a parallel clustering algorithm for cluster system , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).
[143] Craig MacDonald,et al. Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.
[144] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[145] Shlomo Geva,et al. Clustering with Random Indexing K-tree and XML Structure , 2009, INEX.
[146] Richi Nayak,et al. XML Documents Clustering Using a Tensor Space Model , 2011, PAKDD.
[147] Charles L. A. Clarke,et al. Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.
[148] Behrang Q. Zadeh,et al. Random Manhattan Indexing , 2014, 2014 25th International Workshop on Database and Expert Systems Applications.
[149] Gabriella Kazai. Initiative for the Evaluation of XML Retrieval , 2009 .
[150] Lie Lu,et al. Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..
[151] Jun Wang,et al. Self-taught hashing for fast similarity search , 2010, SIGIR.
[152] Sylvain Lamprier,et al. Using Text Segmentation to Enhance the Cluster Hypothesis , 2008, AIMSA.
[153] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[154] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.
[155] Vipin Kumar,et al. Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.
[156] Ludovic Denoyer,et al. The Wikipedia XML Corpus , 2006, INEX.
[157] Magnus Sahlgren,et al. An Introduction to Random Indexing , 2005 .
[158] Curt Burgess,et al. Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .
[159] Fernando Diaz,et al. Regularizing ad hoc retrieval scores , 2005, CIKM '05.
[160] Laura A. Mather,et al. A linear algebra measure of cluster quality , 2000, J. Am. Soc. Inf. Sci..
[161] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[162] Chunming Rong,et al. Using Mahout for Clustering Wikipedia's Latest Articles: A Comparison between K-means and Fuzzy C-means in the Cloud , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.
[163] Nir Ailon,et al. Streaming k-means approximation , 2009, NIPS.
[164] Shlomo Geva,et al. TOPSIG: topology preserving document signatures , 2011, CIKM '11.
[165] Richi Nayak,et al. Overview of the INEX 2009 XML Mining Track: Clustering and Classification of XML Documents , 2009, INEX.
[166] Andrew Trotman,et al. Document Clustering Evaluation: Divergence from a Random Baseline , 2012, ArXiv.
[167] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[168] Alistair Moffat,et al. Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.
[169] B. AfeArd. CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .
[170] Mitsuru Ishizuka,et al. Graph-based Word Clustering using a Web Search Engine , 2006, EMNLP.
[171] Noam Chomsky,et al. Modular Approaches to the Study of the Mind , 1984 .
[172] S. Kotsiantis,et al. Recent Advances in Clustering : A Brief Survey , 2004 .
[173] Shlomo Geva,et al. K-tree: large scale document clustering , 2009, SIGIR.
[174] Xin Liu,et al. Document clustering based on non-negative matrix factorization , 2003, SIGIR.
[175] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[176] Liang-Gee Chen,et al. Vector quantization using tree-structured self-organizing feature maps , 1994, IEEE J. Sel. Areas Commun..
[177] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.
[178] Ramon Ferrer i Cancho,et al. The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.
[179] Mohamed S. Kamel,et al. Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.
[180] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[181] Xiaohua Hu,et al. Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.
[182] Peter J. Rousseeuw,et al. Clustering by means of medoids , 1987 .
[183] Djoerd Hiemstra,et al. Shard ranking and cutoff estimation for topically partitioned collections , 2012, CIKM.
[184] Andrew Y. Ng,et al. Emergence of Object-Selective Features in Unsupervised Feature Learning , 2012, NIPS.
[185] Heikki Mannila,et al. Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.
[186] George Karypis,et al. CLUTO - A Clustering Toolkit , 2002 .
[187] James Allan,et al. A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.
[188] Ting Liu,et al. Clustering Billions of Images with Large Scale Nearest Neighbor Search , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).
[189] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.
[190] Xiaohua Hu,et al. A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).
[191] Theodore S. Rappaport,et al. Wireless communications - principles and practice , 1996 .
[192] Sid Lamrous,et al. Divisive Hierarchical K-Means , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).
[193] Isabelle Guyon,et al. Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.
[194] Shlomo Geva,et al. Document Clustering with K-tree , 2008, INEX.
[195] Ludovic Denoyer,et al. Report on the XML Mining Track at INEX 2005 and INEX 2006 , 2006, INEX.
[196] G. Karypis,et al. Criterion functions for document clustering , 2005 .
[197] Henri Maître,et al. Kernel MDL to Determine the Number of Clusters , 2007, MLDM.
[198] Zellig S. Harris,et al. Distributional Structure , 1954 .
[199] Alistair Moffat,et al. Vector-space ranking with effective early termination , 2001, SIGIR '01.
[200] Sanjoy Dasgupta,et al. Random projection trees and low dimensional manifolds , 2008, STOC.
[201] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[202] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[203] Mihai Surdeanu,et al. A hybrid unsupervised approach for document clustering , 2005, KDD '05.
[204] W. Bruce Croft. A model of cluster searching bases on classification , 1980, Inf. Syst..
[205] Sargur N. Srihari,et al. Fast k-nearest neighbor classification using cluster-based trees , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[206] Anupam Gupta,et al. An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .
[207] K. Sparck Jones,et al. INFORMATION RETRIEVAL TEST COLLECTIONS , 1976 .
[208] Luis M. de Campos,et al. Probabilistic Methods for Link-Based Classification at INEX 2008 , 2009, INEX.
[209] Fionn Murtagh,et al. Overcoming the Curse of Dimensionality in Clustering by Means of the Wavelet Transform , 2000, Comput. J..
[210] Ludovic Denoyer,et al. Overview of the INEX 2008 XML Mining Track , 2008, INEX.
[211] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..
[212] Sudipto Guha,et al. Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..
[213] Stefan Kopp,et al. Learning hierarchical prototypes of motion time series for interactive systems , 2012, ECAI 2012.
[214] Chris H. Q. Ding,et al. K-means clustering via principal component analysis , 2004, ICML.
[215] M. Kendall. A NEW MEASURE OF RANK CORRELATION , 1938 .
[216] Kenneth Rose,et al. Entropy-constrained tree-structured vector quantizer design , 1996, IEEE Trans. Image Process..
[217] Ludovic Denoyer,et al. Report on the XML mining track at INEX 2007 categorization and clustering of XML documents , 2008, SIGF.
[218] W. Bruce Croft,et al. An Evaluation of Techniques for Clustering Search Results , 2005 .
[219] Ah Chung Tsoi,et al. Self Organizing Maps for the Clustering of Large Sets of Labeled Graphs , 2008, INEX.
[220] Frederic Maire,et al. ENTS - a fast and adaptive indexing system for codebooks , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..
[221] W. Bruce Croft,et al. Cluster-based retrieval using language models , 2004, SIGIR '04.
[222] Tefko Saracevic,et al. Effects of Inconsistent Relevance Judgments on Information Retrieval Test Results: A Historical Perspective , 2008, Libr. Trends.
[223] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[224] Charles L. A. Clarke,et al. Overview of the TREC 2012 Web Track , 2012, TREC.
[225] Amresh Kumar,et al. Verification and validation of MapReduce program model for parallel K-means algorithm on Hadoop cluster , 2013, 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT).
[226] Christopher Michael,et al. Application of K-tree to document clustering , 2010 .
[227] R. DeVore,et al. A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .
[228] Özgür Ulusoy,et al. Exploiting Index Pruning Methods for Clustering XML Collections , 2009, INEX.
[229] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.
[230] Jaap Kamps,et al. Using Links to Classify Wikipedia Pages , 2008, INEX.
[231] Kotagiri Ramamohanarao,et al. Inverted files versus signature files for text indexing , 1998, TODS.
[232] Sylvain Lamprier,et al. SegGen: A Genetic Algorithm for Linear Text Segmentation , 2007, IJCAI.
[233] Dong-Hong Ji,et al. Document clustering based on cluster validation , 2004, CIKM '04.
[234] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.
[235] Stijn van Dongen,et al. Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..
[236] Andrew Trotman,et al. Focused Retrieval and Evaluation, 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2009, Brisbane, Australia, December 7-9, 2009, Revised and Selected Papers , 2010, INEX.
[237] Ah Chung Tsoi,et al. Supervised Encoding of Graph-of-Graphs for Classification and Regression Problems , 2009, INEX.
[238] Darnes Vilariño Ayala,et al. BUAP: Performance of K-Star at the INEX'09 Clustering Task , 2009, INEX.
[239] Emanuele Della Valle,et al. An Introduction to Information Retrieval , 2013 .
[240] David A. Forsyth,et al. Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[241] W. Bruce Croft,et al. Cluster-based language models for distributed retrieval , 1999, SIGIR '99.
[242] Laurie J. Heyer,et al. Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.
[243] DenoyerLudovic,et al. Report on the XML mining track at INEX 2005 and INEX 2006 , 2007 .
[244] Vlado Keselj,et al. Document clustering using character N-grams: a comparative evaluation with term-based and word-based clustering , 2005, CIKM '05.
[245] K. Sparck Jones,et al. Simple, proven approaches to text retrieval , 1994 .
[246] Andrew Trotman,et al. Fast and Effective Focused Retrieval , 2009, INEX.
[247] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.
[248] Richi Nayak,et al. Utilizing the Structure and Content Information for XML Document Clustering , 2008, INEX.
[249] Andrew Trotman,et al. Advances in Focused Retrieval, 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, Dagstuhl Castle, Germany, December 15-18, 2008. Revised and Selected Papers , 2009, INEX.
[250] Masayasu Atsumi. Probabilistic Learning of Visual Object Composition from Attended Segments , 2010, ISVC.
[251] Philip Chan,et al. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.
[252] Shlomo Geva,et al. Random Indexing K-tree , 2009, HiPC 2010.
[253] T.W. Fox. Document vector compression and its application in document clustering , 2005, Canadian Conference on Electrical and Computer Engineering, 2005..
[254] Pentti Kanerva,et al. The Spatter Code for Encoding Concepts at Many Levels , 1994 .
[255] James P. Callan,et al. Topic-based Index Partitions for Efficient and Effective Selective Search , 2010, LSDS-IR@SIGIR.
[256] Ellen M. Vdorhees. The cluster hypothesis revisited , 1985, SIGIR 1985.
[257] Pierre Hansen,et al. NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.
[258] Shokri Z. Selim,et al. K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[259] Theodore Kalamboukis,et al. Using clustering to enhance text classification , 2007, SIGIR.
[260] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[261] Justin Zobel,et al. Effective ranking with arbitrary passages , 2001, J. Assoc. Inf. Sci. Technol..
[262] Robert M. Gray,et al. Clustering and Finding the Number of Clusters by Unsupervised Learning of Mixture Models using Vector Quantization , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[263] Milad Shokouhi,et al. Advances in Information Retrieval Theory, Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Cambridge, UK, September 10-12, 2009, Proceedings , 2009, ICTIR.
[264] Luis Gravano,et al. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.
[265] Salvatore T. March,et al. Design and natural science research on information technology , 1995, Decis. Support Syst..
[266] Inderjit S. Dhillon,et al. Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.
[267] James Allan,et al. Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.
[268] Josef Stoer,et al. Numerische Mathematik 1 , 1989 .
[269] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .
[270] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[271] Fabian M. Suchanek,et al. Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .
[272] Gérard Govaert,et al. Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..
[273] Pamela Forner. Multilingual and Multimodal Information Access Evaluation - Second International Conference of the Cross-Language Evaluation Forum, CLEF 2011, Amsterdam, The Netherlands, September 19-22, 2011. Proceedings , 2011, CLEF.
[274] Shlomo Geva. K-tree: a height balanced tree structured vector quantizer , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).
[275] Luis M. de Campos,et al. Link-Based Text Classification Using Bayesian Networks , 2009, INEX.
[276] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .
[277] Vladimir Estivill-Castro,et al. Why so many clustering algorithms: a position paper , 2002, SKDD.
[278] Edward Y. Chang,et al. Parallel Spectral Clustering , 2008, ECML/PKDD.