Finding Answer Passages with Rank Optimizing Decision Trees

The paper discusses the use of decision trees for probability-based ranking. Emphasis is placed on ranking problems in question answering, where the frequency of correct candidates is very low but a single correct answer at one of the top ranks is often sufficient. Since existing tree learners handle this task poorly, decision tree induction is reformulated in such a way that it directly optimizes a given measure of ranking quality (such as mean reciprocal rank or mean average precision). This change also makes it possible to incorporate a priori knowledge about the positive or negative effect of an attribute on ranking quality. Results are further improved by applying a stratified form of bagging. In a passage reranking task using factoid questions from the QA.CLEF evaluations, the new method outperforms existing tree induction techniques by a large margin.

[1]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[2]  Larry P. Heck,et al.  Trada: tree based ranking function adaptation , 2008, CIKM '08.

[3]  Ingo Glöckner,et al.  Combining Logic and Machine Learning for Answering Questions , 2008, CLEF.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Hongyuan Zha,et al.  A regression framework for learning ranking functions using relative relevance judgments , 2007, SIGIR.

[6]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[7]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[8]  Harry Zhang,et al.  Learning probabilistic decision trees for AUC , 2006, Pattern Recognit. Lett..

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[10]  Valentin Jijkoun,et al.  Answer Selection in a Multi-stream Open Domain Question Answering System , 2004, ECIR.

[11]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[12]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[13]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[14]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[15]  Jun Suzuki,et al.  SVM Answer Selection for Open-Domain Question Answering , 2002, COLING.

[16]  Sven Hartrumpf,et al.  University of Hagen at QA@CLEF 2008: Efficient Question Answering with Question Decomposition and Multiple Answer Streams , 2008, CLEF.

[17]  Valentin Jijkoun,et al.  Overview of the CLEF 2007 Multilingual Question Answering Track , 2007, CLEF.

[18]  Ian Witten,et al.  Data Mining , 2000 .