论文信息 - Using a Reference Corpus as a User Model for Focused Information Retrieval - 字舞流文

Using a Reference Corpus as a User Model for Focused Information Retrieval

We propose a method for ranking short information nuggets extracted from a text corpus, using another, reliable reference corpus as a user model. We argue that the availability and usage of such additional corpora is common in a number of IR tasks, and apply the method to answering a form of definition questions. The proposed ranking method makes a substantial improvement in the performance of our system.

Gilad Mishne | Valentin Jijkoun | M. de Rijke | Maarten de Rijke | V. Jijkoun | G. Mishne

[1] Donna K. Harman,et al. Scaling Up the TREC Collection , 1999, Information Retrieval.

[2] Remco C. Veltkamp,et al. Using transportation distances for measuring melodic similarity , 2003, ISMIR.

[3] Ellen M. Voorhees,et al. Evaluating evaluation measure stability , 2000, SIGIR '00.

[4] Esko Ukkonen,et al. The C-BRAHMS project , 2003, ISMIR.

[5] Donna K. Harman,et al. Overview of the TREC 2003 Novelty Track , 2003, TREC.

[6] Remco C. Veltkamp,et al. Searching notated polyphonic music using transportation distances , 2004, MULTIMEDIA '04.

[7] Donna K. Harman,et al. Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[8] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9] Eric C. Jensen,et al. A Survey of Retrieval Strategies for OCR Text Collections , 2002 .

[10] David Hawking,et al. Proximity Operators - So Near And Yet So Far , 1995, TREC.

[11] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[12] Karen Kukich,et al. Techniques for automatically correcting words in text , 1992, CSUR.

[13] Jacques Savoy,et al. Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[14] Jennifer Chu-Carroll,et al. A Multi-Strategy and Multi-Source Approach to Question Answering , 2002, TREC.

[15] Ellen M. Voorhees,et al. Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[16] C. J. van Rijsbergen,et al. Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[17] D. Slawson,et al. What clinical information do doctors need? , 1997 .

[18] Patrick Pantel,et al. Document clustering with committees , 2002, SIGIR '02.

[19] J. Stephen Downie,et al. Toward the scientific evaluation of music information retrieval systems , 2003, ISMIR.

[20] J Deinum,et al. Acute pancreatitis after a course of clarithromycin. , 2003, The Netherlands journal of medicine.

[21] Sylvie Calabretto,et al. Passage à l’échelle dans la taille des corpus. , 2006 .

[22] Wessel Kraaij,et al. Variations on language modeling for information retrieval , 2005, SIGF.

[23] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[24] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[25] Uwe Quasthoff. Tools for automatic lexicon maintenance: acquisition, error correction, and the generation of missing values , 1998 .

[26] Jinxi Xu,et al. TREC 2003 QA at BBN: Answering Definitional Questions , 2003, TREC.

[27] James Allan,et al. Flexible intrinsic evaluation of hierarchical clustering for TDT , 2003, CIKM '03.

[28] Julian Kupiec,et al. MURAX: a robust linguistic approach for question answering using an on-line encyclopedia , 1993, SIGIR.

[29] Wessel Kraaij,et al. Unsupervised Event Clustering in Multilingual News Streams , 2002 .

[30] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[31] Charles L. A. Clarke,et al. Efficient construction of large test collections , 1998, SIGIR '98.

[32] Iadh Ounis,et al. A study of parameter tuning for term frequency normalization , 2003, CIKM '03.

[33] John Howard,et al. Plaine and Easie Code : a code for music bibliography , 1997 .

[34] Peter Bailey,et al. Overview of the TREC-8 Web Track , 2000, TREC.

[35] M. de Rijke,et al. Information Retrieval Support for Ontology Construction and Use , 2004, SEMWEB.

[36] E M van Mulligen,et al. UMLS-based access to CPR data. , 1998, Studies in health technology and informatics.

[37] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[38] Jade Goldstein-Stewart,et al. Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[39] Madhu C. Reddy,et al. Asking questions: information needs in a surgical intensive care unit , 2002, AMIA.

[40] George Karypis,et al. Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[41] J. Humphreys,et al. The best of intentions. , 2002, Harvard business review.

[42] Elizabeth D. Liddy,et al. Advances in Automatic Text Summarization , 2001, Information Retrieval.

[43] Jimmy J. Lin,et al. Data-Intensive Question Answering , 2001, TREC.

[44] Charles L. A. Clarke,et al. Statistical Selection of Exact Answers (MultiText Experiments for TREC 2002) , 2002, TREC.

[45] David R. Karger,et al. Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[46] David Hawking,et al. Overview of the TREC-2002 Web Track , 2002, TREC.

[47] Andreas Paepcke,et al. Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[48] David Hawking,et al. Overview of the TREC-9 Web Track , 2000, TREC.

[49] Donna K. Harman,et al. Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[50] Daniel Marcu,et al. The rhetorical parsing, summarization, and generation of natural language texts , 1998 .

[51] Chuleerat Jaruskulchai,et al. Generic text summarization using local and global properties of sentences , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[52] Eleanor Selfridge-Field,et al. Conceptual and representational issues in melodic comparison , 1998 .

[53] Gilad Mishne,et al. Query Formulation for Answer Projection , 2005, ECIR.

[54] Ellen M. Voorhees,et al. Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[55] Jimmy J. Lin,et al. Extracting Answers from the Web Using Knowledge Annotation and Knowledge Mining Techniques , 2006 .

[56] Ellen M. Voorhees,et al. Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[57] Charles L. A. Clarke,et al. Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[58] Peter Willett,et al. Automatic Spelling Correction Using a Trigram Similarity Measure , 1983, Inf. Process. Manag..

[59] P. Jaccard. THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[60] Klaus Frieler,et al. Measuring melodic similarity: Human vs. algorithmic Judgments , 2004 .

[61] David Hawking,et al. Overview of the TREC 2003 Web Track , 2003, TREC.

[62] Stephen E. Robertson,et al. On Collection Size and Retrieval Effectiveness , 2004, Information Retrieval.

[63] James P. Callan,et al. Combining document representations for known-item search , 2003, SIGIR.

[64] Gareth J. F. Jones,et al. Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[65] Anita Burgun-Parenthoine,et al. Experiments in cross-language medical information retrieval using a mixing translation module , 2004, MedInfo.

[66] Donna K. Harman,et al. How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[67] S. E. Johnsonz,et al. Improving Retrieval on Imperfect Speech Transcriptions , 1999 .

[68] H. P. Edmundson,et al. New Methods in Automatic Extracting , 1969, JACM.

[69] Justin Zobel,et al. How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[70] Dekang Lin,et al. PRINCIPAR - An Efficient, Broad-coverage, Principle-based Parser , 1994, COLING.

[71] David Alex Lamb,et al. Spelling correction in user interfaces , 1983, CACM.

[72] Ophir Frieder. On scalable information retrieval systems , 2002, CIKM '02.

[73] Jaap Kamps,et al. Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary , 2004, ECIR.

[74] Suresh Manandhar,et al. The Use of Sentence Similarity as a Semantic Relevance Metric for Question Answering , 2003, New Directions in Question Answering.

[75] Ian Soboroff,et al. Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[76] Gregory B. Newby. The Science of Large-Scale Information Retrieval , .

[77] James Allan,et al. HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[78] In-Ho Kang,et al. Query type classification for web document retrieval , 2003, SIGIR.

[79] Andrei Broder,et al. A taxonomy of web search , 2002, SIGF.

[80] Patrick Pantel,et al. Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[81] Alan F. Smeaton,et al. An Architecture for Efficient Document Clustering and Retrieval on a Dynamic Collection of Newspaper Texts , 1998, BCS-IRSG Annual Colloquium on IR Research.

[82] Alan F. Smeaton,et al. Replicating Web Structure in Small-Scale Test Collections , 2004, Information Retrieval.

[83] Lutz Prechelt,et al. An interface for melody input , 2001, TCHI.

[84] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[85] Ian H. Witten,et al. Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .