Manifold Learning for Rank Aggregation

We address the task of fusing ranked lists of documents that are retrieved in response to a query. Past work on this task of rank aggregation often assumes that documents in the lists being fused are independent and that only the documents that are ranked high in many lists are likely to be relevant to a given topic. We propose manifold learning aggregation approaches, ManX and v-ManX, that build on the cluster hypothesis and exploit inter-document similarity information. ManX regularizes document fusion scores, so that documents that appear to be similar within a manifold, receive similar scores, whereas v-ManX first generates virtual adversarial documents and then regularizes the fusion scores of both original and virtual adversarial documents. Since aggregation methods built on the cluster hypothesis are computationally expensive, we adopt an optimization method that uses the top-k documents as anchors and considerably reduces the computational complexity of manifold-based methods, resulting in two efficient aggregation approaches, a-ManX and a-v-ManX. We assess the proposed approaches experimentally and show that they significantly outperform the state-of-the-art aggregation approaches, while a-ManX and a-v-ManX run faster than ManX, v-ManX, respectively.

[1]  Donna K. Harman,et al.  Overview of the Third Text REtrieval Conference (TREC-3) , 1995, TREC.

[2]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[3]  M. de Rijke,et al.  Search Result Diversification in Short Text Streams , 2017, ACM Trans. Inf. Syst..

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Chi Zhang,et al.  Deep Manifold Learning of Symmetric Positive Definite Matrices with Application to Face Recognition , 2017, AAAI.

[7]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[8]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.

[9]  M. de Rijke,et al.  Burst-aware data fusion for microblog search , 2015, Inf. Process. Manag..

[10]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[11]  M. de Rijke,et al.  Fusion helps diversification , 2014, SIGIR.

[12]  Hongyuan Zha,et al.  Adaptive Manifold Learning , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[14]  Ellen M. Voorhees,et al.  The TREC 2005 robust track , 2006, SIGF.

[15]  James Allan,et al.  A New Measure of the Cluster Hypothesis , 2009, ICTIR.

[16]  Tao Qin,et al.  Supervised rank aggregation , 2007, WWW '07.

[17]  Emine Yilmaz,et al.  Collaborative User Clustering for Short Text Streams , 2017, AAAI.

[18]  Patrick D. McDaniel,et al.  Adversarial Perturbations Against Deep Neural Networks for Malware Classification , 2016, ArXiv.

[19]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[20]  Ioannis Caragiannis,et al.  Optimizing Positional Scoring Rules for Rank Aggregation , 2016, AAAI.

[21]  Xuelong Li,et al.  Quantifying and Detecting Collective Motion by Manifold Learning , 2017, AAAI.

[22]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[23]  M. de Rijke,et al.  Inferring Dynamic User Interests in Streams of Short Texts for User Clustering , 2017, ACM Trans. Inf. Syst..

[24]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[25]  Maarten de Rijke,et al.  Efficient Structured Learning for Personalized Diversification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[26]  Katja Hofmann,et al.  Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods , 2013, TOIS.

[27]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[28]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[29]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[30]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[31]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[34]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[35]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[36]  Evangelos Kanoulas,et al.  Dynamic Clustering of Streaming Short Documents , 2016, KDD.

[37]  Joydeep Ghosh,et al.  LETOR Methods for Unsupervised Rank Aggregation , 2017, WWW.

[38]  Oren Kurland,et al.  Cluster-based fusion of retrieved lists , 2011, SIGIR.

[39]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Robust Retrieval Track , 2004 .

[40]  Chun Chen,et al.  Efficient manifold ranking for image retrieval , 2011, SIGIR.

[41]  J. Shane Culpepper,et al.  Efficient in-memory top-k document retrieval , 2012, SIGIR '12.

[42]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[43]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[44]  Kien A. Hua,et al.  Multi-view Manifold Learning for Media Interestingness Prediction , 2017, ICMR.