Logistic Regression and EVIs for XML Books and the Heterogeneous Track

For this year's INEX UC Berkeley focused on the Book track and the Heterogeneous track, For these runs we used the TREC2 logistic regression probabilistic model with blind feedback as well as Entry Vocabulary Indexes (EVIs) for the Books Collection MARC data. For the full text records of the book track we encountered a number of interesting problems in setting up the database, and ended up using page-level indexing of the full collection. As (once again) the only group to actually submit runs for the Het track, we are guaranteed both the highest, and lowest, effectiveness scores for each task. However, because it was again deemed pointless to conduct the actual relevance assessments on the submissions of a single system, we do not know the exact values of these results.

[1]  Fredric C. Gey,et al.  Entry Vocabulary - a Technology to Enhance Digital Search , 2001, HLT.

[2]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.

[3]  Luis Gravano,et al.  Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[4]  Ray R. Larson,et al.  Classification Clustering, Probabilistic Information Retrieval, and the Online Catalog , 1991, The Library Quarterly.

[5]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[6]  Andrew Trotman,et al.  Comparative Evaluation of XML Information Retrieval Systems: 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006 Dagstuhl Castle, Germany, December 17-20, 2006 Revised and Selected Papers , 2005 .

[7]  W. Bruce Croft Advances in Informational Retrieval: Recent Research from the Center for Intelligent Information Retrieval , 2000 .

[8]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[9]  Aitao Chen,et al.  Cross-language Retrieval Experiments at CLEF 2002 , 2002, CLEF.

[10]  Ellen M. Voorhees,et al.  The seventh text REtrieval conference (TREC-7) , 1999 .

[11]  Donna K. Harman,et al.  Relevance Feedback and Other Query Modification Techniques , 1992, Information retrieval (Boston).

[12]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[13]  Gabriella Kazai,et al.  Advances in XML Information Retrieval and Evaluation, 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl Castle, Germany, November 28-30, 2005, Revised Selected Papers , 2006, INEX.

[14]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[15]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[16]  James P. Callan,et al.  Effective retrieval with distributed collections , 1998, SIGIR '98.

[17]  Ray R. Larson Evaluation of advanced retrieval techniques in an experimental online catalog , 1992 .

[18]  Fredric C. Gey,et al.  Full Text Retrieval based on Probalistic Equations with Coefficients fitted by Logistic Regression , 1993, TREC.

[19]  Ray R. Larson,et al.  Probabilistic Retrieval Approaches for Thorough and Heterogeneous XML Retrieval , 2006, INEX.

[20]  Julio Gonzalo,et al.  Advances in Cross-Language Information Retrieval , 2002, Lecture Notes in Computer Science.

[21]  Ray R. Larson,et al.  A Fusion Approach to XML Structured Document Retrieval , 2005, Information Retrieval.

[22]  Fredric C. Gey,et al.  Domain-Specific CLIR of English, German and Russian Using Fusion and Subject Metadata for Query Expansion , 2005, CLEF.

[23]  Fredric C. Gey,et al.  Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding , 2004, Information Retrieval.

[24]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[25]  Ray R. Larson Distributed IR for Digital Libraries , 2003, ECDL.

[26]  Ray R. Larson A logistic regression approach to distributed IR , 2002, SIGIR '02.

[27]  Ray R. Larson,et al.  Probabilistic Retrieval, Component Fusion and Blind Feedback for XML Retrieval , 2005, INEX.

[28]  Aitao Chen Multilingual Information Retrieval using English and Chinese Queries , 2001, CLEF.

[29]  Carol Peters,et al.  Evaluation of Cross-Language Information Retrieval Systems , 2002, Lecture Notes in Computer Science.

[30]  Yosi Mass,et al.  Component Ranking and Automatic Query Refinement for XML Retrieval , 2004, INEX.

[31]  Fredric C. Gey,et al.  Probabilistic retrieval based on staged logistic regression , 1992, SIGIR '92.

[32]  Fredric C. Gey,et al.  Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers , 2006, CLEF.