论文信息 - Introduction to Information Retrieval

Introduction to Information Retrieval

Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.

Christopher D. Manning | Hinrich Schütze | P. Raghavan

[1] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[2] William W. Cohen,et al. Learning to Order Things , 2011, NIPS.

[3] Jun Wang,et al. Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval , 2009, ECIR.

[4] Thomas Gärtner,et al. Kernels for structured data , 2008, Series in Machine Perception and Artificial Intelligence.

[5] AnagnostopoulosAris,et al. Effective and efficient classification on a search-engine model , 2008 .

[6] Özgür Ulusoy,et al. Incremental cluster-based retrieval using compressed cluster-skipping inverted files , 2008, TOIS.

[7] Torsten Suel,et al. Performance of compressed inverted list caching in search engines , 2008, WWW.

[8] Yong Yu,et al. Viewing Term Proximity from a Different Perspective , 2008, ECIR.

[9] Wen-tau Yih,et al. Raising the baseline for high-precision text classifiers , 2007, KDD '07.

[10] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[11] Özgür Ulusoy,et al. Large-scale cluster-based retrieval experiments on Turkish texts , 2007, SIGIR.

[12] Filip Radlinski,et al. A support vector method for optimizing average precision , 2007, SIGIR.

[13] Roi Blanco,et al. Boosting static pruning of inverted files , 2007, SIGIR.

[14] Alexandros Ntoulas,et al. Pruning policies for two-tiered inverted index with correctness guarantee , 2007, SIGIR.

[15] W. Bruce Croft,et al. Efficient document retrieval in main memory , 2007, SIGIR.

[16] Tao Qin,et al. Ranking with multiple hyperplanes , 2007, SIGIR.

[17] Tao Qin,et al. Feature selection for ranking , 2007, SIGIR.

[18] Hugh E. Williams,et al. Fast generation of result snippets in web search , 2007, SIGIR.

[19] Peter Gerrand,et al. Estimating Linguistic Diversity on the Internet: A Taxonomy to Avoid Pitfalls and Paradoxes , 2007, J. Comput. Mediat. Commun..

[20] Eli Upfal,et al. Finding near neighbors through cluster pruning , 2007, PODS '07.

[21] Mounia Lalmas,et al. Evaluating XML retrieval effectiveness at INEX , 2007, SIGF.

[22] Yi Liu,et al. Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[23] P. Pirolli. Information Foraging Theory: Adaptive Interaction with Information , 2007 .

[24] Fabrizio Silvestri,et al. Sorting Out the Document Identifier Assignment Problem , 2007, ECIR.

[25] José Luis Vicedo González,et al. TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[26] J. V. Rauff,et al. Finite State Morphology , 2007 .

[27] David Bawden,et al. The Turn: Integration of Information Seeking and Information Retrieval in Context , 2007, J. Documentation.

[28] Songbo Tan,et al. Using hypothesis margin to boost centroid text classifier , 2007, SAC '07.

[29] Tetsuya Sakai,et al. On the reliability of information retrieval metrics based on graded relevance , 2007, Inf. Process. Manag..

[30] Andrew Trotman,et al. XML-IR Users and Use Cases , 2006, INEX.

[31] Stephen E. Robertson,et al. CISR at INEX 2006 , 2006, INEX.

[32] Mounia Lalmas,et al. Advances in XML retrieval: the INEX initiative , 2006, IWRIDL '06.

[33] Sihem Amer-Yahia,et al. XML search: languages, INEX and scoring , 2006, SGMD.

[34] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[35] J. Stephen Downie,et al. The Music Information Retrieval Evaluation eXchange (MIREX) , 2006 .

[36] Charles L. A. Clarke,et al. A document-centric approach to static index pruning in text retrieval systems , 2006, CIKM '06.

[37] Ramayya Krishnan,et al. Incremental hierarchical clustering of text documents , 2006, CIKM '06.

[38] Stephen E. Robertson,et al. Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[39] Andrei Z. Broder,et al. Estimating corpus size via queries , 2006, CIKM '06.

[40] Anja Feldmann,et al. Web search clickstreams , 2006, IMC '06.

[41] Alistair Moffat,et al. Structured Index Organizations for High-Throughput Text Querying , 2006, SPIRE.

[42] M. de Rijke,et al. Articulating information needs in XML query languages , 2006, TOIS.

[43] Gabriella Kazai,et al. eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval , 2006, TOIS.

[44] E GARFIELD,et al. Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[45] Francisco Azuaje,et al. Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques 2nd edition , 2006 .

[46] Roi Blanco,et al. TSP and cluster-based solutions to the reassignment of document identifiers , 2006, Information Retrieval.

[47] Gerhard Weikum,et al. Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.

[48] Thorsten Joachims,et al. Training linear SVMs in linear time , 2006, KDD '06.

[49] W. Bruce Croft,et al. LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[50] Tie-Yan Liu,et al. Adapting ranking SVM to document retrieval , 2006, SIGIR.

[51] Jennifer Chu-Carroll,et al. Semantic search via XML fragments: a high-precision approach to IR , 2006, SIGIR.

[52] Charles L. A. Clarke,et al. Hybrid index maintenance for growing text collections , 2006, SIGIR.

[53] Xiang Ji,et al. Document clustering with prior knowledge , 2006, SIGIR.

[54] Tom M. Mitchell,et al. Text clustering with extended user feedback , 2006, SIGIR.

[55] Grace Hui Yang,et al. Near-duplicate detection by instance-level constrained clustering , 2006, SIGIR.

[56] George Forman,et al. Tackling concept drift by temporal inductive transfer , 2006, SIGIR.

[57] James P. Callan,et al. An experimental study on automatically labeling hierarchical clusters using statistical features , 2006, SIGIR.

[58] Mounia Lalmas,et al. User expectations from XML element retrieval , 2006, SIGIR.

[59] Alistair Moffat,et al. Pruned query evaluation using pre-computed impacts , 2006, SIGIR.

[60] S. Sathiya Keerthi,et al. Large scale semi-supervised linear SVMs , 2006, SIGIR.

[61] Monika Henzinger,et al. Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[62] JUSTIN ZOBEL,et al. Inverted files for text search engines , 2006, CSUR.

[63] Amy Nicole Langville,et al. Google's PageRank and beyond - the science of search engine rankings , 2006 .

[64] Hugh E. Williams,et al. Efficient online index maintenance for contiguous inverted lists , 2006, Inf. Process. Manag..

[65] Rich Caruana,et al. An empirical comparison of supervised learning algorithms , 2006, ICML.

[66] Sergei Vassilvitskii,et al. How slow is the k-means method? , 2006, SCG '06.

[67] Tao Tao,et al. Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[68] Alistair Moffat,et al. Improved word-aligned binary compression for text indexing , 2006, IEEE Transactions on Knowledge and Data Engineering.

[69] Omar Alonso,et al. GIO: a semantic web application using the information grid framework , 2006, WWW '06.

[70] Ziv Bar-Yossef,et al. Random sampling from a search engine's index , 2006, WWW '06.

[71] Eric Brill,et al. Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[72] Timothy Baldwin,et al. Reconsidering Language Identification for Written Language Resources , 2006, LREC.

[73] Gabriella Kazai,et al. Advances in XML Information Retrieval and Evaluation, 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl Castle, Germany, November 28-30, 2005, Revised Selected Papers , 2006, INEX.

[74] Frans Wiering,et al. Bricks: The Building Blocks to Tackle Query Formulation in Structured Document Retrieval , 2006, ECIR.

[75] Patrick Gallinari,et al. Machine Learning Ranking for Structured Information Retrieval , 2006, ECIR.

[76] Marcin Zukowski,et al. Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[77] Marti A. Hearst. Clustering versus faceted categories for information exploration , 2006, Commun. ACM.

[78] David J. Hand,et al. Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[79] Nathalie Jacobs. Springer , 2006 .

[80] Pavel Berkhin,et al. Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[81] Charles L. A. Clarke,et al. A security model for full-text file system search in multi-user environments , 2005, FAST'05.

[82] Changning Huang,et al. Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[83] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[84] Djoerd Hiemstra,et al. TIJAH: Embracing IR Methods in XML Databases , 2005, Information Retrieval.

[85] Ray R. Larson,et al. A Fusion Approach to XML Structured Document Retrieval , 2005, Information Retrieval.

[86] Sihem Amer-Yahia,et al. Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[87] James P. Callan,et al. Parameter Estimation for a Simple Hierarchical Generative Model for XML Retrieval , 2005, INEX.

[88] Ryoji Kataoka,et al. A search result clustering method using informatively named entities , 2005, WIDM '05.

[89] Shlomo Geva,et al. XML Retrieval with a Natural Language Interface , 2005, SPIRE.

[90] Sebastiano Vigna,et al. Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups , 2005, SPIRE.

[91] Alistair Moffat,et al. Fast on-line index construction by geometric partitioning , 2005, CIKM '05.

[92] A. McCallum,et al. Collective multi-label classification , 2005, CIKM '05.

[93] Emine Yilmaz,et al. A geometric interpretation and analysis of R-precision , 2005, CIKM '05.

[94] Djoerd Hiemstra,et al. Score region algebra: building a transparent XML-R database , 2005, CIKM '05.

[95] Jaana Kekäläinen,et al. Generalized contextualization method for XML information retrieval , 2005, CIKM '05.

[96] Charles L. A. Clarke,et al. Indexing time vs. query time: trade-offs in dynamic information retrieval systems , 2005, CIKM '05.

[97] Peter Ingwersen,et al. The Turn - Integration of Information Seeking and Retrieval in Context , 2005, The Kluwer International Series on Information Retrieval.

[98] Jaana Kekäläinen,et al. Binary and graded relevance in IR evaluations--Comparison of the effects on ranking of IR systems , 2005, Inf. Process. Manag..

[99] Djoerd Hiemstra,et al. A Language Modeling Approach to TREC , 2005 .

[100] Gerhard Weikum,et al. An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[101] Jian-Yun Nie,et al. Integrating word relationships into language models , 2005, SIGIR '05.

[102] Thorsten Joachims,et al. Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[103] Debapriyo Majumdar,et al. Why spectral retrieval works , 2005, SIGIR '05.

[104] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[105] Marina Meila,et al. Comparing clusterings: an axiomatic view , 2005, ICML.

[106] G. M. Allan,et al. Kappa statistic , 2005, Canadian Medical Association Journal.

[107] Yiming Yang,et al. Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[108] Sebastiano Vigna,et al. PageRank as a function of the damping factor , 2005, WWW '05.

[109] Dawid Weiss,et al. A concept-driven algorithm for clustering search results , 2005, IEEE Intelligent Systems.

[110] Hugh E. Williams,et al. Searchable words on the Web , 2005, International Journal on Digital Libraries.

[111] Chih-Jen Lin,et al. A tutorial on?-support vector machines , 2005 .

[112] Judit Bar-Ilan,et al. How do search engines respond to some non-English queries? , 2005, J. Inf. Sci..

[113] Sebastiano Vigna,et al. Codes for the World Wide Web , 2005, Internet Math..

[114] Pavel Berkhin,et al. A Survey on PageRank Computing , 2005, Internet Math..

[115] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.

[116] Andrew Trotman,et al. Narrowed Extended XPath I (NEXI) , 2004, INEX.

[117] M. de Rijke,et al. Mixture Models, Overlap, and Structural Hints in XML Element Retrieval , 2004, INEX.

[118] Ismail Sengör Altingövde,et al. Efficiency and effectiveness of query processing in cluster-based retrieval , 2004, Inf. Syst..

[119] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[120] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[121] Hugh E. Williams,et al. Fast phrase querying with combined indexes , 2004, TOIS.

[122] George Forman,et al. Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.

[123] Sergio M. Savaresi,et al. A comparative analysis on the bisecting K-means and the PDDP clustering algorithms , 2004, Intell. Data Anal..

[124] Prabhakar Raghavan,et al. Efficiency-Quality Tradeoffs for Vector Score Aggregation , 2004, VLDB.

[125] Byron Dom,et al. Document preprocessing for naive Bayes classification and clustering with mixture of multinomials , 2004, KDD.

[126] Robert D. Nowak,et al. Likelihood based hierarchical clustering , 2004, IEEE Transactions on Signal Processing.

[127] David J. Harper,et al. Topic modeling for mediated access to very large document collections , 2004, J. Assoc. Inf. Sci. Technol..

[128] W. Bruce Croft,et al. Cluster-based retrieval using language models , 2004, SIGIR '04.

[129] Jianfeng Gao,et al. Dependence language model for information retrieval , 2004, SIGIR '04.

[130] M. de Rijke,et al. Length normalization in XML retrieval , 2004, SIGIR '04.

[131] Sebastiano Vigna,et al. UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[132] Thomas L. Griffiths,et al. The Author-Topic Model for Authors and Documents , 2004, UAI.

[133] George Forman,et al. A pitfall and solution in multi-class feature selection for text classification , 2004, ICML.

[134] Jason Baldridge,et al. Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[135] Eric Brill,et al. Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[136] Arindam Banerjee,et al. Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[137] R. Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[138] Sebastiano Vigna,et al. The webgraph framework I: compression techniques , 2004, WWW '04.

[139] Oren Kurland,et al. Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[140] Piotr Indyk,et al. Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[141] Roberto Basili,et al. Complex Linguistic Features for Text Classification: A Comprehensive Study , 2004, ECIR.

[142] Norbert Fuhr,et al. XIRQL: An XML query language based on information retrieval concepts , 2004, TOIS.

[143] CHENGXIANG ZHAI,et al. A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[144] Fabrizio Silvestri,et al. Assigning document identifiers to enhance compressibility of Web Search Engines indexes , 2004, SAC '04.

[145] Barbara Di Eugenio,et al. Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[146] Greg Hamerly,et al. Learning the k in k-means , 2003, NIPS.

[147] Yiming Yang,et al. Margin-based local regression for adaptive filtering , 2003, CIKM '03.

[148] Torsten Suel,et al. Optimized Query Execution in Large Search Engines with Global Page Ordering , 2003, VLDB.

[149] David R. Karger,et al. Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[150] Yiming Yang,et al. A Loss Function Analysis for Classification Methods in Text Categorization , 2003, ICML.

[151] Stephen Tomlinson,et al. Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003 , 2003, CLEF.

[152] David Carmel,et al. Searching XML documents via XML fragments , 2003, SIGIR.

[153] Djoerd Hiemstra,et al. Bayesian extension to the language model for ad hoc information retrieval , 2003, SIGIR.

[154] Richard Sproat,et al. The First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[155] Taher H. Haveliwala. Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[156] Lucian Vlad Lita,et al. tRuEcasIng , 2003, ACL.

[157] Mounia Lalmas,et al. A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[158] Justin Zobel,et al. Efficient single-pass index construction for text databases , 2003, J. Assoc. Inf. Sci. Technol..

[159] Jennifer Widom,et al. Scaling personalized web search , 2003, WWW '03.

[160] Alessandro Moschitti,et al. A Study on Optimal Parameter Tuning for Rocchio Text Classifier , 2003, ECIR.

[161] Victor Carneiro,et al. Optimization of Restricted Searches in Web Directories Using Hybrid Data Structures , 2003, ECIR.

[162] James Theiler,et al. Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[163] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[164] David M. Pennock,et al. Inferring hierarchical descriptions , 2002, CIKM '02.

[165] George Karypis,et al. Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[166] Jaana Kekäläinen,et al. Using graded relevance assessments in IR evaluation , 2002, J. Assoc. Inf. Sci. Technol..

[167] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[168] M. Lombard,et al. Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability , 2002 .

[169] Kui-Lam Kwok,et al. A comparison of Chinese document indexing strategies and retrieval models , 2002, TALIP.

[170] Hugh E. Williams,et al. Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[171] Hugh E. Williams,et al. Efficient phrase querying with an auxiliary index , 2002, SIGIR '02.

[172] John D. Lafferty,et al. Two-stage language models for information retrieval , 2002, SIGIR '02.

[173] Andrew Turpin,et al. User interface effects in past batch versus user experiments , 2002, SIGIR '02.

[174] Djoerd Hiemstra,et al. The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[175] Torsten Suel,et al. Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.

[176] Byron Dom,et al. An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[177] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[178] Robert Villa,et al. The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[179] Dan Klein,et al. Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach , 2002, ICML.

[180] Dan Klein,et al. Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[181] Kristina Toutanova,et al. Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[182] Peter Jackson,et al. Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .

[183] Taher H. Haveliwala. Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[184] David M. Pennock,et al. Using web structure for classifying and describing web pages , 2002, WWW.

[185] Hector Garcia-Molina,et al. Parallel crawlers , 2002, WWW.

[186] Torsten Schlieder,et al. Querying and ranking XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[187] Guy E. Blelloch,et al. Index compression through document reordering , 2002, Proceedings DCC 2002. Data Compression Conference.

[188] Yiming Yang,et al. Information Filtering in TREC-9 and TDT-3: A Comparative Analysis , 2002, Information Retrieval.

[189] Hugo Zaragoza,et al. Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[190] Hugh E. Williams,et al. Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[191] David Evans,et al. Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[192] Nello Cristianini,et al. Classification using String Kernels , 2000 .

[193] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[194] Erhard Rahm,et al. A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[195] D. Hand,et al. Idiot's Bayes—Not So Stupid After All? , 2001 .

[196] N. Ziviani,et al. Distributed query processing using partitioned inverted files , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[197] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[198] B. S. Manjunath,et al. Category-based image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[199] Jugal K. Kalita,et al. Summarization as feature selection for text categorization , 2001, CIKM '01.

[200] John D. Lafferty,et al. Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[201] Karl Aberer,et al. P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[202] Zhu Zhang,et al. Interactive, Domain-Independent Identification and Summarization of Topically Related News Articles , 2001, ECDL.

[203] Alistair Moffat,et al. Vector-space ranking with effective early termination , 2001, SIGIR '01.

[204] Ronald Fagin,et al. Static index pruning for information retrieval systems , 2001, SIGIR '01.

[205] W. Bruce Croft,et al. Relevance-Based Language Models , 2001, SIGIR '01.

[206] Yiming Yang,et al. A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[207] Andrew Turpin,et al. Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[208] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[209] Michael I. Jordan,et al. Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[210] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[211] Kishore Papineni,et al. Why Inverse Document Frequency? , 2001, NAACL.

[212] Chris H. Q. Ding,et al. Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[213] Andrew Turpin,et al. Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations , 2001, Inf. Process. Manag..

[214] Allan Borodin,et al. Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[215] P. Kantor. Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[216] Tong Zhang,et al. Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[217] William R. Hersh,et al. Managing Gigabytes—Compressing and Indexing Documents and Images (Second Edition) , 2001, Information Retrieval.

[218] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[219] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[220] Patrick J. Flynn,et al. A 20th Anniversary Survey: Introduction to 'Content-Based Image Retrieval at the End of the Early Years' , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[221] Santosh S. Vempala,et al. On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[222] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[223] Eric Brill,et al. An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[224] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[225] Amanda Spink,et al. Use of query reformulation and relevance feedback by Excite users , 2000, Internet Res..

[226] Stephen E. Robertson,et al. Parallel search using partitioned inverted files , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[227] George Karypis,et al. Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[228] Paul N. Bennett. Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[229] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[230] Masaki Murata,et al. Japanese probabilistic information retrieval using location and category information , 2000, IRAL '00.

[231] D. Hiemstra. A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[232] Hsin-Hsi Chen,et al. A Muitilingual News Summarizer , 2000, COLING.

[233] Pedro M. Domingos. A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[234] Luis Gravano,et al. An investigation of linguistic features and clustering algorithms for topical document clustering , 2000, SIGIR '00.

[235] Andrew Turpin,et al. Do batch and user evaluations give the same results? , 2000, SIGIR '00.

[236] Susan T. Dumais,et al. Hierarchical classification of Web content , 2000, SIGIR '00.

[237] Shivakumar Vaithyanathan,et al. Model-Based Hierarchical Clustering , 2000, UAI.

[238] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.