Introduction to information retrieval

Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.

[1]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[2]  William W. Cohen,et al.  Learning to Order Things , 2011, NIPS.

[3]  Jun Wang,et al.  Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval , 2009, ECIR.

[4]  Thomas Gärtner,et al.  Kernels for structured data , 2008, Series in Machine Perception and Artificial Intelligence.

[5]  AnagnostopoulosAris,et al.  Effective and efficient classification on a search-engine model , 2008 .

[6]  Özgür Ulusoy,et al.  Incremental cluster-based retrieval using compressed cluster-skipping inverted files , 2008, TOIS.

[7]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[8]  Yong Yu,et al.  Viewing Term Proximity from a Different Perspective , 2008, ECIR.

[9]  Wen-tau Yih,et al.  Raising the baseline for high-precision text classifiers , 2007, KDD '07.

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  Özgür Ulusoy,et al.  Large-scale cluster-based retrieval experiments on Turkish texts , 2007, SIGIR.

[12]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[13]  Roi Blanco,et al.  Boosting static pruning of inverted files , 2007, SIGIR.

[14]  Alexandros Ntoulas,et al.  Pruning policies for two-tiered inverted index with correctness guarantee , 2007, SIGIR.

[15]  W. Bruce Croft,et al.  Efficient document retrieval in main memory , 2007, SIGIR.

[16]  Tao Qin,et al.  Ranking with multiple hyperplanes , 2007, SIGIR.

[17]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[18]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[19]  Peter Gerrand,et al.  Estimating Linguistic Diversity on the Internet: A Taxonomy to Avoid Pitfalls and Paradoxes , 2007, J. Comput. Mediat. Commun..

[20]  Eli Upfal,et al.  Finding near neighbors through cluster pruning , 2007, PODS '07.

[21]  Mounia Lalmas,et al.  Evaluating XML retrieval effectiveness at INEX , 2007, SIGF.

[22]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[23]  P. Pirolli Information Foraging Theory: Adaptive Interaction with Information , 2007 .

[24]  Fabrizio Silvestri,et al.  Sorting Out the Document Identifier Assignment Problem , 2007, ECIR.

[25]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[26]  J. V. Rauff,et al.  Finite State Morphology , 2007 .

[27]  David Bawden,et al.  The Turn: Integration of Information Seeking and Information Retrieval in Context , 2007, J. Documentation.

[28]  Songbo Tan,et al.  Using hypothesis margin to boost centroid text classifier , 2007, SAC '07.

[29]  Tetsuya Sakai,et al.  On the reliability of information retrieval metrics based on graded relevance , 2007, Inf. Process. Manag..

[30]  Andrew Trotman,et al.  XML-IR Users and Use Cases , 2006, INEX.

[31]  Stephen E. Robertson,et al.  CISR at INEX 2006 , 2006, INEX.

[32]  Mounia Lalmas,et al.  Advances in XML retrieval: the INEX initiative , 2006, IWRIDL '06.

[33]  Sihem Amer-Yahia,et al.  XML search: languages, INEX and scoring , 2006, SGMD.

[34]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[35]  J. Stephen Downie,et al.  The Music Information Retrieval Evaluation eXchange (MIREX) , 2006 .

[36]  Charles L. A. Clarke,et al.  A document-centric approach to static index pruning in text retrieval systems , 2006, CIKM '06.

[37]  Ramayya Krishnan,et al.  Incremental hierarchical clustering of text documents , 2006, CIKM '06.

[38]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[39]  Andrei Z. Broder,et al.  Estimating corpus size via queries , 2006, CIKM '06.

[40]  Anja Feldmann,et al.  Web search clickstreams , 2006, IMC '06.

[41]  Alistair Moffat,et al.  Structured Index Organizations for High-Throughput Text Querying , 2006, SPIRE.

[42]  M. de Rijke,et al.  Articulating information needs in XML query languages , 2006, TOIS.

[43]  Gabriella Kazai,et al.  eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval , 2006, TOIS.

[44]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[45]  Francisco Azuaje,et al.  Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques 2nd edition , 2006 .

[46]  Roi Blanco,et al.  TSP and cluster-based solutions to the reassignment of document identifiers , 2006, Information Retrieval.

[47]  Gerhard Weikum,et al.  Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.

[48]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[49]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[50]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[51]  Jennifer Chu-Carroll,et al.  Semantic search via XML fragments: a high-precision approach to IR , 2006, SIGIR.

[52]  Charles L. A. Clarke,et al.  Hybrid index maintenance for growing text collections , 2006, SIGIR.

[53]  Xiang Ji,et al.  Document clustering with prior knowledge , 2006, SIGIR.

[54]  Tom M. Mitchell,et al.  Text clustering with extended user feedback , 2006, SIGIR.

[55]  Grace Hui Yang,et al.  Near-duplicate detection by instance-level constrained clustering , 2006, SIGIR.

[56]  George Forman,et al.  Tackling concept drift by temporal inductive transfer , 2006, SIGIR.

[57]  James P. Callan,et al.  An experimental study on automatically labeling hierarchical clusters using statistical features , 2006, SIGIR.

[58]  Mounia Lalmas,et al.  User expectations from XML element retrieval , 2006, SIGIR.

[59]  Alistair Moffat,et al.  Pruned query evaluation using pre-computed impacts , 2006, SIGIR.

[60]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[61]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[62]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[63]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[64]  Hugh E. Williams,et al.  Efficient online index maintenance for contiguous inverted lists , 2006, Inf. Process. Manag..

[65]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[66]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[67]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[68]  Alistair Moffat,et al.  Improved word-aligned binary compression for text indexing , 2006, IEEE Transactions on Knowledge and Data Engineering.

[69]  Omar Alonso,et al.  GIO: a semantic web application using the information grid framework , 2006, WWW '06.

[70]  Ziv Bar-Yossef,et al.  Random sampling from a search engine's index , 2006, WWW '06.

[71]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[72]  Timothy Baldwin,et al.  Reconsidering Language Identification for Written Language Resources , 2006, LREC.

[73]  Gabriella Kazai,et al.  Advances in XML Information Retrieval and Evaluation, 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl Castle, Germany, November 28-30, 2005, Revised Selected Papers , 2006, INEX.

[74]  Frans Wiering,et al.  Bricks: The Building Blocks to Tackle Query Formulation in Structured Document Retrieval , 2006, ECIR.

[75]  Patrick Gallinari,et al.  Machine Learning Ranking for Structured Information Retrieval , 2006, ECIR.

[76]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[77]  Marti A. Hearst Clustering versus faceted categories for information exploration , 2006, Commun. ACM.

[78]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[79]  Nathalie Jacobs Springer , 2006 .

[80]  Pavel Berkhin,et al.  Bookmark-Coloring Algorithm for Personalized PageRank Computing , 2006, Internet Math..

[81]  Charles L. A. Clarke,et al.  A security model for full-text file system search in multi-user environments , 2005, FAST'05.

[82]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[83]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[84]  Djoerd Hiemstra,et al.  TIJAH: Embracing IR Methods in XML Databases , 2005, Information Retrieval.

[85]  Ray R. Larson,et al.  A Fusion Approach to XML Structured Document Retrieval , 2005, Information Retrieval.

[86]  Sihem Amer-Yahia,et al.  Report on the DB/IR panel at SIGMOD 2005 , 2005, SGMD.

[87]  James P. Callan,et al.  Parameter Estimation for a Simple Hierarchical Generative Model for XML Retrieval , 2005, INEX.

[88]  Ryoji Kataoka,et al.  A search result clustering method using informatively named entities , 2005, WIDM '05.

[89]  Shlomo Geva,et al.  XML Retrieval with a Natural Language Interface , 2005, SPIRE.

[90]  Sebastiano Vigna,et al.  Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups , 2005, SPIRE.

[91]  Alistair Moffat,et al.  Fast on-line index construction by geometric partitioning , 2005, CIKM '05.

[92]  A. McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[93]  Emine Yilmaz,et al.  A geometric interpretation and analysis of R-precision , 2005, CIKM '05.

[94]  Djoerd Hiemstra,et al.  Score region algebra: building a transparent XML-R database , 2005, CIKM '05.

[95]  Jaana Kekäläinen,et al.  Generalized contextualization method for XML information retrieval , 2005, CIKM '05.

[96]  Charles L. A. Clarke,et al.  Indexing time vs. query time: trade-offs in dynamic information retrieval systems , 2005, CIKM '05.

[97]  Peter Ingwersen,et al.  The Turn - Integration of Information Seeking and Retrieval in Context , 2005, The Kluwer International Series on Information Retrieval.

[98]  Jaana Kekäläinen,et al.  Binary and graded relevance in IR evaluations--Comparison of the effects on ranking of IR systems , 2005, Inf. Process. Manag..

[99]  Djoerd Hiemstra,et al.  A Language Modeling Approach to TREC , 2005 .

[100]  Gerhard Weikum,et al.  An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.

[101]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[102]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[103]  Debapriyo Majumdar,et al.  Why spectral retrieval works , 2005, SIGIR '05.

[104]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[105]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[106]  G. M. Allan,et al.  Kappa statistic , 2005, Canadian Medical Association Journal.

[107]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[108]  Sebastiano Vigna,et al.  PageRank as a function of the damping factor , 2005, WWW '05.

[109]  Dawid Weiss,et al.  A concept-driven algorithm for clustering search results , 2005, IEEE Intelligent Systems.

[110]  Hugh E. Williams,et al.  Searchable words on the Web , 2005, International Journal on Digital Libraries.

[111]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[112]  Judit Bar-Ilan,et al.  How do search engines respond to some non-English queries? , 2005, J. Inf. Sci..

[113]  Sebastiano Vigna,et al.  Codes for the World Wide Web , 2005, Internet Math..

[114]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[115]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[116]  Andrew Trotman,et al.  Narrowed Extended XPath I (NEXI) , 2004, INEX.

[117]  M. de Rijke,et al.  Mixture Models, Overlap, and Structural Hints in XML Element Retrieval , 2004, INEX.

[118]  Ismail Sengör Altingövde,et al.  Efficiency and effectiveness of query processing in cluster-based retrieval , 2004, Inf. Syst..

[119]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[120]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[121]  Hugh E. Williams,et al.  Fast phrase querying with combined indexes , 2004, TOIS.

[122]  George Forman,et al.  Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.

[123]  Sergio M. Savaresi,et al.  A comparative analysis on the bisecting K-means and the PDDP clustering algorithms , 2004, Intell. Data Anal..

[124]  Prabhakar Raghavan,et al.  Efficiency-Quality Tradeoffs for Vector Score Aggregation , 2004, VLDB.

[125]  Byron Dom,et al.  Document preprocessing for naive Bayes classification and clustering with mixture of multinomials , 2004, KDD.

[126]  Robert D. Nowak,et al.  Likelihood based hierarchical clustering , 2004, IEEE Transactions on Signal Processing.

[127]  David J. Harper,et al.  Topic modeling for mediated access to very large document collections , 2004, J. Assoc. Inf. Sci. Technol..

[128]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[129]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[130]  M. de Rijke,et al.  Length normalization in XML retrieval , 2004, SIGIR '04.

[131]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[132]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[133]  George Forman,et al.  A pitfall and solution in multi-class feature selection for text classification , 2004, ICML.

[134]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[135]  Eric Brill,et al.  Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users , 2004, EMNLP.

[136]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[137]  R. Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[138]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[139]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[140]  Piotr Indyk,et al.  Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[141]  Roberto Basili,et al.  Complex Linguistic Features for Text Classification: A Comprehensive Study , 2004, ECIR.

[142]  Norbert Fuhr,et al.  XIRQL: An XML query language based on information retrieval concepts , 2004, TOIS.

[143]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[144]  Fabrizio Silvestri,et al.  Assigning document identifiers to enhance compressibility of Web Search Engines indexes , 2004, SAC '04.

[145]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[146]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[147]  Yiming Yang,et al.  Margin-based local regression for adaptive filtering , 2003, CIKM '03.

[148]  Torsten Suel,et al.  Optimized Query Execution in Large Search Engines with Global Page Ordering , 2003, VLDB.

[149]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[150]  Yiming Yang,et al.  A Loss Function Analysis for Classification Methods in Text Categorization , 2003, ICML.

[151]  Stephen Tomlinson,et al.  Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003 , 2003, CLEF.

[152]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[153]  Djoerd Hiemstra,et al.  Bayesian extension to the language model for ad hoc information retrieval , 2003, SIGIR.

[154]  Richard Sproat,et al.  The First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[155]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[156]  Lucian Vlad Lita,et al.  tRuEcasIng , 2003, ACL.

[157]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[158]  Justin Zobel,et al.  Efficient single-pass index construction for text databases , 2003, J. Assoc. Inf. Sci. Technol..

[159]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[160]  Alessandro Moschitti,et al.  A Study on Optimal Parameter Tuning for Rocchio Text Classifier , 2003, ECIR.

[161]  Victor Carneiro,et al.  Optimization of Restricted Searches in Web Directories Using Hybrid Data Structures , 2003, ECIR.

[162]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[163]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[164]  David M. Pennock,et al.  Inferring hierarchical descriptions , 2002, CIKM '02.

[165]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[166]  Jaana Kekäläinen,et al.  Using graded relevance assessments in IR evaluation , 2002, J. Assoc. Inf. Sci. Technol..

[167]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[168]  M. Lombard,et al.  Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability , 2002 .

[169]  Kui-Lam Kwok,et al.  A comparison of Chinese document indexing strategies and retrieval models , 2002, TALIP.

[170]  Hugh E. Williams,et al.  Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[171]  Hugh E. Williams,et al.  Efficient phrase querying with an auxiliary index , 2002, SIGIR '02.

[172]  John D. Lafferty,et al.  Two-stage language models for information retrieval , 2002, SIGIR '02.

[173]  Andrew Turpin,et al.  User interface effects in past batch versus user experiments , 2002, SIGIR '02.

[174]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[175]  Torsten Suel,et al.  Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.

[176]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[177]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[178]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[179]  Dan Klein,et al.  Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based approach , 2002, ICML.

[180]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[181]  Kristina Toutanova,et al.  Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[182]  Peter Jackson,et al.  Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .

[183]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[184]  David M. Pennock,et al.  Using web structure for classifying and describing web pages , 2002, WWW.

[185]  Hector Garcia-Molina,et al.  Parallel crawlers , 2002, WWW.

[186]  Torsten Schlieder,et al.  Querying and ranking XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[187]  Guy E. Blelloch,et al.  Index compression through document reordering , 2002, Proceedings DCC 2002. Data Compression Conference.

[188]  Yiming Yang,et al.  Information Filtering in TREC-9 and TDT-3: A Comparative Analysis , 2002, Information Retrieval.

[189]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[190]  Hugh E. Williams,et al.  Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[191]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[192]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[193]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[194]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[195]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[196]  N. Ziviani,et al.  Distributed query processing using partitioned inverted files , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[197]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[198]  B. S. Manjunath,et al.  Category-based image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[199]  Jugal K. Kalita,et al.  Summarization as feature selection for text categorization , 2001, CIKM '01.

[200]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[201]  Karl Aberer,et al.  P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[202]  Zhu Zhang,et al.  Interactive, Domain-Independent Identification and Summarization of Topically Related News Articles , 2001, ECDL.

[203]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[204]  Ronald Fagin,et al.  Static index pruning for information retrieval systems , 2001, SIGIR '01.

[205]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[206]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[207]  Andrew Turpin,et al.  Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[208]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[209]  Michael I. Jordan,et al.  Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[210]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[211]  Kishore Papineni,et al.  Why Inverse Document Frequency? , 2001, NAACL.

[212]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[213]  Andrew Turpin,et al.  Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations , 2001, Inf. Process. Manag..

[214]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[215]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[216]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[217]  William R. Hersh,et al.  Managing Gigabytes—Compressing and Indexing Documents and Images (Second Edition) , 2001, Information Retrieval.

[218]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[219]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[220]  Patrick J. Flynn,et al.  A 20th Anniversary Survey: Introduction to 'Content-Based Image Retrieval at the End of the Early Years' , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[221]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[222]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[223]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[224]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[225]  Amanda Spink,et al.  Use of query reformulation and relevance feedback by Excite users , 2000, Internet Res..

[226]  Stephen E. Robertson,et al.  Parallel search using partitioned inverted files , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[227]  George Karypis,et al.  Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.

[228]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[229]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[230]  Masaki Murata,et al.  Japanese probabilistic information retrieval using location and category information , 2000, IRAL '00.

[231]  D. Hiemstra A probabilistic justification for using tf×idf term weighting in information retrieval , 2000, International Journal on Digital Libraries.

[232]  Hsin-Hsi Chen,et al.  A Muitilingual News Summarizer , 2000, COLING.

[233]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[234]  Luis Gravano,et al.  An investigation of linguistic features and clustering algorithms for topical document clustering , 2000, SIGIR '00.

[235]  Andrew Turpin,et al.  Do batch and user evaluations give the same results? , 2000, SIGIR '00.

[236]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[237]  Shivakumar Vaithyanathan,et al.  Model-Based Hierarchical Clustering , 2000, UAI.

[238]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[239]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[240]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[241]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[242]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[243]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[244]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[245]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[246]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[247]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[248]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[249]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[250]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[251]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[252]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[253]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[254]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[255]  Berthier A. Ribeiro-Neto,et al.  Efficient distributed algorithms to build inverted files , 1999, SIGIR '99.

[256]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[257]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[258]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[259]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[260]  J. Shawe-Taylor,et al.  Using KCCA for Japanese-English cross-language information retrieval and classification , 2004 .

[261]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[262]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[263]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[264]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[265]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[266]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[267]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[268]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[269]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[270]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[271]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[272]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[273]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[274]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[275]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[276]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[277]  G. Cottrell,et al.  Optimizing Similarity Using Multi-Query Relevance Feedback , 1998, J. Am. Soc. Inf. Sci..

[278]  Berthier A. Ribeiro-Neto,et al.  Query performance for tightly coupled distributed digital libraries , 1998, DL '98.

[279]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[280]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[281]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[282]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[283]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[284]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[285]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[286]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[287]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[288]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[289]  Don R. Swanson,et al.  Historical note: Information retrieval and the future of an illusion , 1997, J. Am. Soc. Inf. Sci..

[290]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[291]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[292]  Ricardo A. Baeza-Yates,et al.  Proximal nodes: a model to query document databases by content and structure , 1997, TOIS.

[293]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[294]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[295]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[296]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[297]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[298]  Chris Buckley,et al.  Learning routing queries in a query zone , 1997, SIGIR '97.

[299]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[300]  D. Gusfield Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology , 1997 .

[301]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[302]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[303]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[304]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[305]  Robert Krovetz,et al.  Word sense disambiguation for large text databases , 1996 .

[306]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[307]  Justin Zobel,et al.  Filtered Document Retrieval with Frequency-Sorted Indexes , 1996, J. Am. Soc. Inf. Sci..

[308]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[309]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[310]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[311]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[312]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[313]  Nicholas J. Belkin,et al.  A case for interaction: a study of interactive information retrieval behavior and effectiveness , 1996, CHI.

[314]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[315]  Alistair Moffat,et al.  Exploiting clustering in inverted file compression , 1996, Proceedings of Data Compression Conference - DCC '96.

[316]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[317]  Thomas Bäck An Empirical Comparison , 1996 .

[318]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[319]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[320]  Michael W. Berry,et al.  Using latent semantic indexing for multilanguage information retrieval , 1995, Comput. Humanit..

[321]  Eric W. Brown,et al.  Execution performance issues in full-text information retrieval , 1995 .

[322]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[323]  Alistair Moffat,et al.  In Situ Generation of Compressed Inverted Files , 1995, J. Am. Soc. Inf. Sci..

[324]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[325]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[326]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[327]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[328]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[329]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[330]  Alistair Moffat,et al.  Efficient Retrieval of Partial Documents , 1995, Inf. Process. Manag..

[331]  Gerard Salton,et al.  Length Normalization in Degraded Text Collections , 1995 .

[332]  James Blustein,et al.  A Statistical Analysis of the TREC-3 Data , 1995, TREC.

[333]  Justin Zobel,et al.  Finding approximate matches in large lexicons , 1995, Softw. Pract. Exp..

[334]  Byeong-Soo Jeong,et al.  Inverted File Partitioning Schemes in Multiple Disk Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[335]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[336]  Oliver A. McBryan,et al.  GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[337]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[338]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[339]  Howard R. Turtle Natural language vs. Boolean query evaluation: a comparison of retrieval performance , 1994, SIGIR '94.

[340]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[341]  Michael Persin,et al.  Document filtering for fast ranking , 1994, SIGIR '94.

[342]  Fredric C. Gey,et al.  Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[343]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[344]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[345]  Chilin Shih,et al.  A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[346]  Norbert Fuhr,et al.  Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions , 1994, TOIS.

[347]  Hector Garcia-Molina,et al.  Query processing and inverted indices in shared-nothing text document information retrieval systems , 1993, The VLDB Journal.

[348]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[349]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[350]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[351]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[352]  David L. Waltz,et al.  Trading MIPS and memory for knowledge engineering , 1992, CACM.

[353]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[354]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[355]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[356]  Alistair Moffat,et al.  Parameterised compression for sparse bitmaps , 1992, SIGIR '92.

[357]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[358]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[359]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[360]  Donna Harman,et al.  Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking. , 1990 .

[361]  Fazli Can,et al.  Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases , 1990, TODS.

[362]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[363]  Michael B. Eisenberg,et al.  A re-examination of relevance: toward a dynamic, situational definition , 1990, Inf. Process. Manag..

[364]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[365]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[366]  Ian H. Witten,et al.  Source Models for Natural Language Text , 1990, Int. J. Man Mach. Stud..

[367]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[368]  William Pugh,et al.  Skip lists: a probabilistic alternative to balanced trees , 1989, CACM.

[369]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[370]  C. J. van Rijsbergen,et al.  Towards an information logic , 1989, SIGIR '89.

[371]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[372]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[373]  Paul B. Kantor,et al.  A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap* , 1988 .

[374]  Paul B. Kantor,et al.  A study of information seeking and retrieving. II. Users, questions, and effectiveness , 1988, J. Am. Soc. Inf. Sci..

[375]  Edward A. Fox,et al.  Experimental Comparison of Schemes for Interpreting Boolean Queries , 1988 .

[376]  Carolyn J. Crouch,et al.  A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[377]  S. K. Michael Wong,et al.  Linear structure in information retrieval , 1988, SIGIR '88.

[378]  Gilbert Strang,et al.  Introduction to applied mathematics , 1988 .

[379]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[380]  Peter Willett,et al.  Hierarchic document classification using Ward's clustering method , 1986, SIGIR '86.

[381]  E. Voorhees The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval , 1985 .

[382]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[383]  H. Edelsbrunner,et al.  Efficient algorithms for agglomerative hierarchical clustering methods , 1984 .

[384]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[385]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[386]  M. A. Wong A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[387]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artif. Intell..

[388]  Derrick Grover,et al.  Cryptography: A Primer , 1982 .

[389]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[390]  James L. Peterson,et al.  Computer programs for detecting and correcting spelling errors , 1980, CACM.

[391]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[392]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[393]  W. Bruce Croft A file organization for cluster-based retrieval , 1978, SIGIR '78.

[394]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[395]  Peter Mark Roget,et al.  Roget's International Thesaurus , 1977 .

[396]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[397]  Eugene Garfield,et al.  The permuterm subject index: An autobiographical review , 1976, J. Am. Soc. Inf. Sci..

[398]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[399]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[400]  H. Akaike A new look at the statistical model identification , 1974 .

[401]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[402]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[403]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[404]  Tony Greenfield,et al.  Probability and Statistics for Engineers and Scientists. , 1978 .

[405]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[406]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[407]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[408]  Geoffrey H. Ball,et al.  Data analysis in the social sciences: what about the details? , 1965, AFIPS '65 (Fall, part I).

[409]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[410]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[411]  Charles P. Bourne,et al.  A Study of Methods for Systematically Abbreviating English Words and Names , 1961, JACM.

[412]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[413]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[414]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[415]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[416]  Allen Kent,et al.  Machine literature searching VIII. Operational criteria for designing information retrieval systems , 1955 .

[417]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[418]  For Spelling , 1899 .

[419]  R. Buyya,et al.  Ensemble Learning , 2021, Machine Learning for Cloud Management.

[420]  Michael Collins,et al.  EM Algorithm , 2010, Encyclopedia of Machine Learning.

[421]  Seungjin Choi,et al.  Supervised Learning , 2015, Encyclopedia of Biometrics.

[422]  G. Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[423]  Daniel Jurafsky,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2009, Prentice Hall series in artificial intelligence.

[424]  A. Trotman Narrowed Extended XPath I , 2009, Encyclopedia of Database Systems.

[425]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[426]  Wei-Ho Chung,et al.  Probabilistic Model , 2009, Encyclopedia of Database Systems.

[427]  Gerhard Weikum,et al.  TopX: efficient and versatile top-k query processing for semistructured data , 2007, The VLDB Journal.

[428]  Anastasio Tombros,et al.  Comparative Evaluation of XML Information Retrieval Systems, 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006, Dagstuhl Castle, Germany, December 17-20, 2006, Revised and Selected Papers , 2007, INEX.

[429]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[430]  Davide Picca,et al.  Non-linear correspondence analysis in text retrieval: a kernel view , 2006 .

[431]  Sihem Amer-Yahia,et al.  XQuery Full-Text extensions explained , 2006, IBM Syst. J..

[432]  Junhui Wang,et al.  On Transductive Support Vector Machines , 2006 .

[433]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[434]  Andrew Trotman,et al.  Passage Retrieval and other XML-Retrieval Tasks , 2006, SIGIR 2006.

[435]  Gonzalo Navarro,et al.  Lightweight natural language text compression , 2006, Information Retrieval.

[436]  Tom M. Mitchell,et al.  Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[437]  David Johnson,et al.  More Effective Web Search Using Bigrams and Trigrams , 2006, Webology.

[438]  Paolo Boldi,et al.  The Choice of a Damping Function for Propagating Importance in Link-Based Ranking , 2005 .

[439]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[440]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[441]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[442]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Sixth NTCIR Workshop , 2005, NTCIR.

[443]  Sergei Vassilvitskii,et al.  On the Worst Case Complexity of the k-means Method , 2005 .

[444]  I. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[445]  B. Nordstrom FINITE MARKOV CHAINS , 2005 .

[446]  M. de Rijke,et al.  Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.

[447]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[448]  Richard A. O'Keefe,et al.  The Simplest Query Language That Could Possibly Work , 2004 .

[449]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[450]  Jinwoo Park,et al.  Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[451]  Karen Spärck,et al.  Language modelling ’ s generative model : is it rational ? , 2004 .

[452]  Steven Garcia,et al.  Access-Ordered Indexes , 2004, ACSC.

[453]  Roberto Basili,et al.  Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[454]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[455]  ChengXiang Zhai,et al.  Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[456]  Benno Stein,et al.  On Cluster Validity and the Information Need of Users , 2003 .

[457]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[458]  Wessel Kraaij,et al.  Language Models for Topic Tracking , 2003 .

[459]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[460]  David Madigan,et al.  On the Naive Bayes Model for Text Categorization , 2003, AISTATS.

[461]  Ian Davidson,et al.  Speeding up k-means Clustering by Bootstrap Averaging , 2003 .

[462]  Hans-Jörg Schek,et al.  Generating Vector Spaces On-the-fly for Flexible XML Retrieval , 2002 .

[463]  David Carmel,et al.  JuruXML - an XML Retrieval System at INEX'02 , 2002, INEX Workshop.

[464]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[465]  Marc Najork,et al.  High-performance Web Crawling High-performance Web Crawling Publication History , 2001 .

[466]  Joydeep Ghosh,et al.  Relationship-based clustering and cluster ensembles for high-dimensional data mining , 2002 .

[467]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[468]  Cong Yu,et al.  Integration of IR into an XML Database , 2002, INEX Workshop.

[469]  M. de Rijke,et al.  The Importance of Morphological Normalization for XML Retrieval , 2002, INEX Workshop.

[470]  Sun-Ok Gwon University of Texas at Austin의 연구 현황 , 2002 .

[471]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[472]  David M. Pennock,et al.  Methods for Sampling Pages Uniformly from the World Wide Web , 2001 .

[473]  H. Garcia-Molina,et al.  Building a distributed full-text index for the web , 2001, TOIS.

[474]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[475]  Ohm Sornil,et al.  Parallel Inverted Indices for Large-Scale, Dynamic Digital Libraries , 2001 .

[476]  Michael S. Lew,et al.  Principles of Visual Information Retrieval , 2001, Advances in Pattern Recognition.

[477]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[478]  Eugene M. Kleinberg,et al.  On the Algorithmic Implementation of , 2000 .

[479]  R. Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[480]  Lyle H. Ungar,et al.  Automatic Labeling of Document Clusters , 2000, KDD 2000.

[481]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[482]  Jakub Zavrel,et al.  Information Extraction by Text Classification: Corpus Mining for Features , 2000 .

[483]  Andrew Turpin,et al.  Further Analysis of Whether Batch and User Evaluations Give the Same Results with a Question-Answering Task , 2000, TREC.

[484]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[485]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[486]  Hsin-Hsi Chen,et al.  A Muitilingual News Summarizer , 2000, COLING.

[487]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[488]  John C. Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[489]  J. Dean,et al.  A Comparison of Techniques to Find Mirrored Hosts on the WWW. , 1999 .

[490]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[491]  Ken Lunde,et al.  CJKV Information Processing , 1999 .

[492]  W. R. Grei,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR 1998.

[493]  R. Papka,et al.  On-line new event detection and tracking , 1998, SIGIR '98.

[494]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[495]  David R. Anderson,et al.  Bayesian Methods in Cosmology: Model selection and multi-model inference , 2009 .

[496]  Susan T. Dumais,et al.  Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing , 1998 .

[497]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[498]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[499]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[500]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[501]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[502]  J Allan,et al.  Readings in information retrieval. , 1998 .

[503]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[504]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[505]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[506]  Stephen P. Harter,et al.  Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness , 1996, J. Am. Soc. Inf. Sci..

[507]  Yves Chiaramella,et al.  A Model for Multimedia Information Retrieval , 1996 .

[508]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[509]  Karen Spärck Jones,et al.  Natural language processing for information retrieval , 1996, CACM.

[510]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[511]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[512]  Jan O. Pedersen Information Retrieval Based on Word Senses , 1995 .

[513]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[514]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[515]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI): TREC-3 Report , 1994, TREC.

[516]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[517]  Brian T. Bartell,et al.  Optimizing ranking functions: a connectionist approach to adaptive information retrieval , 1994 .

[518]  James Allan,et al.  Automatic Routing and Ad-hoc Retrieval Using SMART: TREC 2 , 1993, TREC.

[519]  Fredric C. Gey,et al.  Full Text Retrieval based on Probalistic Equations with Coefficients fitted by Logistic Regression , 1993, TREC.

[520]  Susan T. Dumais,et al.  Latent Semantic Indexing (LSI) and TREC-2 , 1993, TREC.

[521]  S. Ullman,et al.  Clustering algorithms , 2020, Computational Learning Approaches to Data Analytics in Biomedical Applications.

[522]  Richard Sproat,et al.  Morphology and computation , 1992 .

[523]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[524]  Charles T. Meadow,et al.  Text information retrieval systems , 1992 .

[525]  Edward A. Fox,et al.  FAST-INV: A Fast Algorithm for building large inverted files , 1991 .

[526]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[527]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[528]  Kenneth R. Beesley,et al.  Language Identifier: A Computer Program for Automatic Natural-Language Identification of On-line Tex , 1988 .

[529]  M. Lesk GRAB - Inverted Indexes with Low Storage Overhead , 1988, Comput. Syst..

[530]  Gerald Salton,et al.  Automatic text processing , 1988 .

[531]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[532]  Aviezri S. Fraenkel,et al.  Novel Compression of Sparse Bit-Strings — Preliminary Report , 1985 .

[533]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[534]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[535]  Gabriel Pinski,et al.  Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics , 1976, Inf. Process. Manag..

[536]  Gerard Salton,et al.  Dynamic information and library processing , 1975 .

[537]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[538]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[539]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[540]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[541]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[542]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[543]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[544]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[545]  W. J. Langford Statistical Methods , 1959, Nature.

[546]  Harold Wooster,et al.  Information storage and retrieval theory, systems, and devices , 1958 .

[547]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[548]  Benno Stein,et al.  Topic Identification: Framework and Application , 2022 .