Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text

The usual approach to finding information on the WWW via existing Web browsers is to use a one or two word query. Browsers return a number of documents containing these words, and the user examines those documents, or their abstracts, sees how the word or words in their query are being used and alters their initial query accordingly. This contrasts markedly with the Information Retrieval models explored by researchers over the past thirty-five years. These models were designed for longer queries and do not provide an adequate response to the user needs. On the other hand, recent advances in natural language processing permit the extraction of typed information that is axed on one or two words. We review a selection of this typed information and describe how it could be used to present an intermediate structure for the user fitting between their short queries and the documents found in a heterogeneous text collection such as the WWW.

[1]  Lauri Karttunen,et al.  Finite-state lexicon compiler , 1993 .

[2]  Ted Briscoe,et al.  Computational lexicography for natural language processing , 1989 .

[3]  Gregory Grefenstette Light parsing as finite state filtering , 1999 .

[4]  Beatrice Warren,et al.  Semantic patterns of noun-noun compounds , 1978 .

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[7]  Casimir Borkowski An experimental system for automatic identification of personal names and personal titles in newspaper texts , 1967 .

[8]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[9]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[10]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[11]  David R. Karger,et al.  Scatter/Gather as a Tool for the Navigation of Retrieval Results , 1995 .

[12]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[13]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[14]  Jean-Pierre Chanod,et al.  Creating a tagset, lexicon and guesser for a French tagger , 1995, ArXiv.

[15]  Claire Grover,et al.  The derivation of a large computational lexicon for English from LDOCE , 1989 .

[16]  Kui-Lam Kwok,et al.  A new method of weighting query terms for ad-hoc retrieval , 1996, SIGIR '96.

[17]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[18]  Fah-Chun Cheong Internet Agents: Spiders, Wanderers, Brokers, and 'Bots , 1996 .

[19]  Gregory Grefenstette,et al.  Corpus-Derived First, Second and Third-Order Word Affinities , 1994 .

[20]  Ulrich Heid,et al.  The DECIDE Project: Multilingual Collocation Extraction , 1996 .

[21]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[22]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[23]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[24]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[25]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[26]  Lauri Karttunen Directed Replacement , 1996, ACL.

[27]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[28]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[29]  Frédérique Segond,et al.  Multilingual natural language processing , 1997 .

[30]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[31]  Gerard Salton,et al.  A note on information retrieval models and theories , 1985, RIAO.

[32]  Gregory Grefenstette,et al.  Regular expressions for language engineering , 1996, Natural Language Engineering.

[33]  Lauren B. Doyle,et al.  Semantic Road Maps for Literature Searchers , 1961, JACM.

[34]  Ted Briscoe,et al.  Hybrid techniques for training HMM part-of-speech taggers , 1996 .

[35]  David D. McDonald Internal and External Evidence in the Identification and Semantic Categorization of Proper Names , 1993 .

[36]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[37]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[38]  Stephen G. Pulman,et al.  A Dictionary and Morphological Analyser for English , 1986, COLING.

[39]  Eugene Charniak,et al.  Statistical language learning , 1997 .