Clustering web documents: a phrase-based method for grouping search engine results

Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results

[1]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[2]  Marti A. Hearst Using Categories to Provide Context for Full-Text Retrieval Results , 1994, RIAO.

[3]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[4]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[5]  James Allan,et al.  Recent Experiments with INQUERY , 1995, TREC.

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  James Allan,et al.  Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems , 1998, SIGIR '98.

[8]  Nicholas J. Belkin,et al.  Evaluation of a tool for visualization of information retrieval results , 1996, SIGIR '96.

[9]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[10]  Gerald Salton,et al.  Automatic text processing , 1988 .

[11]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[12]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[13]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[14]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[15]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[16]  李幼升,et al.  Ph , 1989 .

[17]  Hinrich Schütze,et al.  Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks , 1996, TREC.

[18]  Stephen P. Harter Variations in relevance assessments and the measurement of retrieval effectiveness , 1996 .

[19]  A OlsenKai,et al.  Visualization of a document collection , 1993 .

[20]  Jock D. Mackinlay,et al.  The information visualizer, an information workspace , 1991, CHI.

[21]  Mark D. Dunlop Time, relevance and interaction modelling for information retrieval , 1997, SIGIR '97.

[22]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[23]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[24]  Ellen Riloff,et al.  A Case Study in Using Linguistic Phrases for Text Categorization on the WWW , 1998 .

[25]  Xia Lin Self-organizing semantic maps as graphical interfaces for information retrieval , 1993 .

[26]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[27]  Robert R. Korfhage,et al.  To see, or not to see— is That the query? , 1991, SIGIR '91.

[28]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[29]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[30]  Peter Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997 .

[31]  Martin Dillon,et al.  FASIT: A fully automatic syntactically based indexing system , 1983, J. Am. Soc. Inf. Sci..

[32]  Tomek Strzalkowski,et al.  Natural Language Information Retrieval: TREC-8 Report , 1994, TREC.

[33]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[34]  Matthew Chalmers,et al.  Bead: explorations in information visualization , 1992, SIGIR '92.

[35]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[36]  Marti A. Hearst,et al.  Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy , 1997, SIGIR '97.

[37]  William S. Cooper,et al.  Getting beyond Boole , 1988, Inf. Process. Manag..

[38]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[39]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[40]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[41]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[42]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[43]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[44]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[45]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[46]  Alan J. Wecker,et al.  The Librarian's Assistant: Automatically Organizing On-line Books into Dynamic Bookshelves , 1994, RIAO.

[47]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[48]  Sougata Mukherjea,et al.  Visualizing complex hypermedia networks through multiple hierarchical views , 1995, CHI '95.

[49]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[50]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[51]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[52]  Jan O. Pedersen,et al.  Almost-constant-time clustering of arbitrary corpus subsets4 , 1997, SIGIR '97.

[53]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[54]  Wendy A. Lawrence-Fowler,et al.  Integrating query thesaurus, and documents through a common visual representation , 1991, SIGIR '91.

[55]  Teuvo Kohonen,et al.  Exploration of very large databases by self-organizing maps , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[56]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[57]  Edward A. Fox,et al.  Visualizing search results: some alternatives to query-document similarity , 1996, SIGIR '96.

[58]  Roberto J. Bayardo Brute-Force Mining of High-Confidence Classification Rules , 1997, KDD.

[59]  Brian D. Davison,et al.  Human Performance on Clustering Web Pages: A Preliminary Study , 1998, KDD.

[60]  Edie M. Rasmussen,et al.  Clustering Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[61]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[62]  Xia Lin Map displays for information retrieval , 1997 .

[63]  Johannes Fürnkranz,et al.  A Study Using $n$-gram Features for Text Categorization , 1998 .

[64]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[65]  Aravindan Veerasamy,et al.  Effectiveness of a graphical display of retrieval results , 1997, SIGIR '97.

[66]  W. Bruce Croft,et al.  Support for Browsing in an Intelligent Text Retrieval System , 1989, Int. J. Man Mach. Stud..

[67]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[68]  Robert R. Korfhage,et al.  Visualization of a Document Collection: The VIBE System , 1993, Inf. Process. Manag..

[69]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[70]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[71]  Robert R. Korfhage,et al.  GUIDO: visualizing document retrieval , 1997, Proceedings. 1997 IEEE Symposium on Visual Languages (Cat. No.97TB100180).

[72]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[73]  Robert B. Allen,et al.  An interface for navigating clustered document sets returned by queries , 1993, COCS '93.

[74]  Robert R. Korfhage,et al.  BIRD: browsing interface for the retrieval of documents , 1994, Proceedings of 1994 IEEE Symposium on Visual Languages.

[75]  Anselm Spoerri,et al.  InfoCrystal: a visual tool for information retrieval & management , 1993, CIKM '93.

[76]  David Haussler,et al.  A new distance metric on strings computable in linear time , 1988, Discret. Appl. Math..

[77]  Matthias Hemmje,et al.  LyberWorld—a visualization user interface supporting fulltext retrieval , 1994, SIGIR '94.

[78]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[79]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[80]  G. W. Milligan,et al.  The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Natasa Milic-Frayling,et al.  Evaluation of Syntactic Phrase Indexing -- CLARIT NLP Track Report , 1996, TREC.

[82]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[83]  Dieter Merkl,et al.  Exploration of text collections with hierarchical feature maps , 1997, SIGIR '97.

[84]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[85]  Ronen Feldman,et al.  Visualization Techniques to Explore Data Mining Results for Document Collections , 1997, KDD.