Document Performance Prediction for Automatic Text Classification

Query performance prediction (QPP) is a fundamental task in information retrieval, which concerns predicting the effectiveness of a ranking model for a given query in the absence of relevance information. Despite being an active research area, this task has not yet been explored in the context of automatic text classification. In this paper, we study the task of predicting the effectiveness of a classifier for a given document, which we refer to as document performance prediction (DPP). Our experiments on several text classification datasets for both categorization and sentiment analysis attest the effectiveness and complementarity of several DPP inspired by related QPP approaches. Finally, we also explore the usefulness of DPP for improving the classification itself, by using them as additional features in a classification ensemble.

[1]  Oren Kurland,et al.  Using Document-Quality Measures to Predict Web-Search Effectiveness , 2013, ECIR.

[2]  Josiane Mothe,et al.  Query Performance Prediction Focused on Summarized Letor Features , 2018, SIGIR.

[3]  Djoerd Hiemstra,et al.  A survey of pre-retrieval query performance predictors , 2008, CIKM '08.

[4]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.

[5]  Oren Kurland,et al.  Using statistical decision theory and relevance models for query-performance prediction , 2010, SIGIR.

[6]  Oren Kurland,et al.  A Unified Framework for Post-Retrieval Query-Performance Prediction , 2011, ICTIR.

[7]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[8]  Joemon M. Jose,et al.  Improved query performance prediction using standard deviation , 2011, SIGIR.

[9]  J. Shane Culpepper,et al.  Neural Query Performance Prediction using Weak Supervision from Multiple Signals , 2018, SIGIR.

[10]  Oren Kurland,et al.  Using the cross-entropy method to re-rank search results , 2014, SIGIR.

[11]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[12]  Marcos André Gonçalves,et al.  Stacking Bagged and Boosted Forests for Effective Automated Classification , 2017, SIGIR.

[13]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[16]  Oren Kurland,et al.  Query-performance prediction: setting the expectations straight , 2014, SIGIR.

[17]  Huidong Jin,et al.  CenKNN: a scalable and effective text classifier , 2014, Data Mining and Knowledge Discovery.

[18]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[19]  Lourdes Araujo,et al.  Standard Deviation as a Query Hardness Estimator , 2010, SPIRE.

[20]  Shengli Wu,et al.  Query Performance Prediction By Considering Score Magnitude and Variance Together , 2014, CIKM.

[21]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[22]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[23]  Marcos André Gonçalves,et al.  BROOF: Exploiting Out-of-Bag Errors, Boosting and Random Forests for Effective Automated Classification , 2015, SIGIR.

[24]  Yiming Yang,et al.  Multilabel classification with meta-level features , 2010, SIGIR.

[25]  Shariq Bashir Combining pre-retrieval query quality predictors using genetic programming , 2013, Applied Intelligence.

[26]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[27]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[28]  Djoerd Hiemstra,et al.  The Combination and Evaluation of Query Performance Prediction Methods , 2009, ECIR.

[29]  Claudia Hauff,et al.  Predicting the effectiveness of queries and retrieval systems , 2010, SIGF.

[30]  Haggai Roitman Query Performance Prediction using Passage Information , 2018, SIGIR.

[31]  Haggai Roitman,et al.  Robust Standard Deviation Estimation for Query Performance Prediction , 2017, ICTIR.

[32]  Craig MacDonald,et al.  On the usefulness of query features for learning to rank , 2012, CIKM.

[33]  Josiane Mothe,et al.  Linguistic features to predict query difficulty , 2005, SIGIR 2005.

[34]  Josiane Mothe,et al.  Query Performance Prediction and Effectiveness Evaluation Without Relevance Judgments: Two Sides of the Same Coin , 2018, SIGIR.