A taxonomy of web search

Classic IR (information retrieval) is inherently predicated on users searching for information, the so-called "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop, download a file, or find a map). We explore this taxonomy of web searches and discuss how global search engines evolved to deal with web-specific needs.

[1]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[2]  Marc A. Zissman,et al.  Automatic language identification , 2001, Speech Commun..

[3]  Isabelle Moulinier,et al.  West Group at CLEF2000: Non-English Monolingual Retrieval , 2000, CLEF.

[4]  Stéphane Bressan,et al.  Indexing the Indonesian Web: Language Identification and Miscellaneous Issues , 2001, WWW Posters.

[5]  Anne N. De Roeck,et al.  A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots , 2000, ACL.

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[7]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[8]  Maarten de Rijke,et al.  Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian , 2001, CLEF.

[9]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[10]  Saul Greenberg,et al.  How people revisit web pages: empirical findings and implications for the design of history systems , 1997, Int. J. Hum. Comput. Stud..

[11]  Christine D. Piatko,et al.  JHU/APL at TREC 2001: Experiments in Filtering and in Arabic, Video, and Web Retrieval , 2001, TREC.

[12]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[13]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[14]  C. F. Hockett,et al.  The World's Writing Systems , 1997 .

[15]  Hsinchun Chen,et al.  Supporting Multilingual Information Retrieval in Web Applications: An English-Chinese Web Portal Experiment , 2003, ICADL.

[16]  Stefan Langer,et al.  Natural languages and the world wide Web , 2001 .

[17]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[18]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[19]  Amanda Spink,et al.  A day in the life of Web searching: an exploratory study , 2004, Inf. Process. Manag..

[20]  John M. Prager,et al.  Linguini: language identification for multilingual documents , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[21]  Yvonne Rogers,et al.  Cognitive strategies in web searching. , 1999 .

[22]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[23]  Mark Stevenson,et al.  EuroWordNet as a Resource for Cross-language Information Retrieval , 2004, LREC.

[24]  Rafael Dueire Lins,et al.  Automatic language identification of written texts , 2004, SAC '04.

[25]  Walter Daelemans,et al.  A language-independent, data-oriented architecture for grapheme-to-phoneme conversion , 1994, SSW.

[26]  Douglas-Val Ziegler The automatic identification of languages using linguistic recognition signals , 1992 .

[27]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[28]  Peter Willett,et al.  Processing morphological variants in searches of Latin text , 1996, Information Research.

[29]  Kazuhide Yamamoto,et al.  Detecting Transliterated Orthographic Variants via Two Similarity Metrics , 2004, COLING.

[30]  William A. Woods,et al.  Aggressive Morphology for Robust Lexical Coverage , 2000, ANLP.

[31]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.

[32]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[33]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[34]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[35]  Arjen Poutsma,et al.  Applying Monte Carlo Techniques to Language Identification , 2001, CLIN.

[36]  Brian Detlor,et al.  Information Seeking on the Web: An Integrated Model of Browsing and Searching , 2000, First Monday.

[37]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[38]  NarayananGarcia-Molina Hector Shivakumar,et al.  What do people want from information retrieval?: the top 10 research issues for companies that use and sell IR systems , 1995 .

[39]  Günter Neumann,et al.  Mining answers in German Web pages , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[40]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[41]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[42]  Marek Sroka Web Search Engines for Polish Information Retrieval: Questions of Search Capabilities and Retrieval Performance , 2000 .

[43]  William A. Woods,et al.  Natural Language Technology in Precision Content Retrieval , 1998 .

[44]  Sofia Stamou,et al.  Use of a Morphosyntactic Lexicon as the Basis for the Implementation of the Greek Wordnet , 2000, Natural Language Processing.

[45]  Jihoon Yang,et al.  A Fast Algorithm for Hierarchical Text Classification , 2000, DaWaK.

[46]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[47]  Evelyne Tzoukermann,et al.  Effective use of natural language processing techniques for automatic conflation of multi-word terms: the role of derivational morphology, part of speech tagging, and shallow parsing , 1997, SIGIR '97.

[48]  Ioannis Pitas,et al.  Language identification in web documents using discrete HMMs , 2004, Pattern Recognit..

[49]  George W. Adamson,et al.  The use of an association measure based on character structure to identify semantically related pairs of words and document titles , 1974, Inf. Storage Retr..

[50]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[51]  Janet C. Erickson,et al.  Options for presentation of multilingual text: use of the Unicode standard , 1997 .

[52]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[53]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[54]  Arnaud Rey,et al.  Graphemes are perceptual reading units , 2000, Cognition.

[55]  Jeong-Bae Son Computer-assisted language learning : concepts, contexts and practices , 2004 .

[56]  M. Sanati,et al.  Iranian Standard Code for Information Interchange (ISCII) , 1987 .

[57]  Dimitris Koutsogiannis,et al.  Greeklish and Greekness: Trends and Discourses of "Glocalness" , 2006, J. Comput. Mediat. Commun..

[58]  David Hawking,et al.  Which search engine is best at finding airline site home pages , 2001 .

[59]  David Hawking,et al.  Which Search Engine is Best at Finding Online Services? , 2001, WWW Posters.

[60]  Douglas W. Oard,et al.  Term selection for searching printed Arabic , 2002, SIGIR '02.

[61]  Cheng Soon Ong,et al.  On designing an automated Malaysian stemmer for the Malay language (poster session) , 2000, IRAL '00.

[62]  András A. Benczúr,et al.  Searching a Small National Domain - Preliminary Report , 2003, WWW.

[63]  Kepa Sarasola,et al.  Automatic morphological analysis of Basque , 1996 .

[64]  Harshit Surana,et al.  Study of Cognates among South Asian Languages for the Purpose of Building Lexical Resources , .

[65]  Richard Ishida An Introduction to Indic Scripts , 2002 .

[66]  R. Sproat A FORMAL COMPUTATIONAL ANALYSIS OF INDIC SCRIPTS , 2003 .

[67]  Wanda Pratt,et al.  Transparent Queries: investigation users' mental models of search engines , 2001, SIGIR '01.

[68]  Itziar Aduriz,et al.  EUSLEM: A Lemmatiser/Tagger for Basque , 1996 .

[69]  Christoph Hölscher,et al.  Web search behavior of Internet experts and newbies , 2000, Comput. Networks.

[70]  Ibrahim Sogukpinar,et al.  Centroid-Based Language Identification Using Letter Feature Set , 2004, CICLing.

[71]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[72]  Stephen J. Green,et al.  Linguistic Knowledge can Improve Information Retrieval , 2000, ANLP.

[73]  Hiroshi Nakagawa,et al.  Automatic Construction of Japanese KATAKANA Variant List from Large Corpus , 2004, COLING.

[74]  James F. Allen,et al.  Bi-directional conversion between graphemes and phonemes using a joint N-gram model , 2001, SSW.

[75]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[76]  Alexandros Karakos Greeklish: An experimental interface for automatic transliteration , 2003, J. Assoc. Inf. Sci. Technol..

[77]  Christian Plaunt,et al.  On the Construction of Selection Systems. , 1994 .

[78]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[79]  Richard Sproat,et al.  Book Reviews: A Computational Theory of Writing Systems , 2006, CL.

[80]  L. R. Rabiner,et al.  A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.

[81]  Mark W. Davis,et al.  Improving cross-language text retrieval with human interactions , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[82]  Florian Coulmas,et al.  Writing Systems: An Introduction to Their Linguistic Analysis , 2002 .

[83]  Andy Cockburn,et al.  Which way now? Analysing and easing inadequacies in WWW navigation , 1996, Int. J. Hum. Comput. Stud..

[84]  Judit Bar-Ilan,et al.  How do search engines handle non-English queries? - A case study , 2003, WWW.

[85]  Karine Megerdoomian,et al.  Processing Persian Text: Tokenization in the Shiraz Project , 2000 .

[86]  Andrew Large,et al.  Information Retrieval from Full-Text Arabic Databases: Can Search Engines Designed for English Do the Job? , 2001 .

[87]  H. Haddouti Survey: Multilingual Text Retrieval and Access , 1999 .

[88]  Ben Shneiderman,et al.  Clarifying Search: A User-Interface Framework for Text Searches , 1997, D Lib Mag..

[89]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[90]  Ola Knutsson,et al.  Improving Precision in Information Retrieval for Swedish using Stemming , 2001, NODALIDA.

[91]  Michael D. Byrne,et al.  The tangled Web we wove: a taskonomy of WWW use , 1999, CHI '99.

[92]  Andreas Nürnberger,et al.  Improving Ontology-Based Sense Folder Classification of Document Collections with Clustering Methods , 2004 .

[93]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[94]  W. Bruce Croft,et al.  Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[95]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[96]  Siamak Rezaei Tokenizing an Arabic Script Language , 2001 .

[97]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[98]  Gerald Salton,et al.  Automatic text processing , 1988 .

[99]  Peter Willett,et al.  The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data , 1992, J. Am. Soc. Inf. Sci..

[100]  Olatz Ansa,et al.  EDBL: a multi-purposed lexical support for treatment of Basque , 1998, LREC.

[101]  JAE HUN CHOI,et al.  An Object-Based Approach to Managing Domain Specific Thesauri: Semiautomatic Thesaurus Construction and Query-Based Browsing , 2002, Int. J. Softw. Eng. Knowl. Eng..