DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval

Most neural Information Retrieval (Neu-IR) models derive query-to-document ranking scores based on term-level matching. Inspired by TileBars, a classical term distribution visualization method, in this paper, we propose a novel Neu-IR model that handles query-to-document matching at the subtopic and higher levels. Our system first splits the documents into topical segments, "visualizes" the matchings between the query and the segments, and then feeds an interaction matrix into a Neu-IR model, DeepTileBars, to obtain the final ranking scores. DeepTileBars models the relevance signals occurring at different granularities in a document's topic hierarchy. It better captures the discourse structure of a document and thus the matching patterns. Although its design and implementation are light-weight, DeepTileBars outperforms other state-of-the-art Neu-IR models on benchmark datasets including the Text REtrieval Conference (TREC) 2010-2012 Web Tracks and LETOR 4.0.

[1]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[2]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[3]  Charles L. A. Clarke,et al.  Overview of the TREC 2012 Web Track , 2012, TREC.

[4]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[5]  Tao Qin,et al.  Introducing LETOR 4.0 Datasets , 2013, ArXiv.

[6]  Diana Inkpen,et al.  Query-Structure Based Web Page Indexing , 2012, TREC.

[7]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[10]  Laure Soulier,et al.  DSRIM: A Deep Neural Information Retrieval Model Enhanced by a Knowledge Resource Driven Representation of Documents , 2017, ICTIR.

[11]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[12]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[13]  Anna Kazantseva,et al.  Linear Text Segmentation Using Affinity Propagation , 2011, EMNLP.

[14]  Joemon M. Jose,et al.  Text segmentation via topic modeling: an analytical study , 2009, CIKM.

[15]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[16]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[17]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[18]  Leonid Boytsov,et al.  Evaluating Learning-to-Rank Methods in the Web Track Adhoc Task , 2011, TREC.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Bahar Karaoglan,et al.  IRRA at TREC 2010: Index Term Weighting by Divergence From Independence Model , 2010, TREC.

[21]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[22]  Xueqi Cheng,et al.  DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[23]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[24]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[25]  Marti A. Hearst Search User Interfaces , 2009 .

[27]  Xueqi Cheng,et al.  Learning Visual Features from Snapshots for Web Search , 2017, CIKM.

[28]  Jimmy J. Lin,et al.  UMD and USC/ISI: TREC 2010 Web Track Experiments with Ivory , 2010, TREC.

[29]  Vasudeva Varma,et al.  Attention-Based Neural Text Segmentation , 2018, ECIR.

[30]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[31]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[32]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[33]  Kasper Hornbæk,et al.  Reading of electronic documents: the usability of linear, fisheye, and overview+detail interfaces , 2001, CHI.

[34]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[35]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[38]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[39]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[40]  Donald Byrd,et al.  A scrollbar-based visualization for document navigation , 1999, DL '99.

[41]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[42]  Jun Xu,et al.  Modeling Diverse Relevance Patterns in Ad-hoc Retrieval , 2018, SIGIR.

[43]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[44]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[45]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[46]  Xueqi Cheng,et al.  A Study of MatchPyramid Models on Ad-hoc Retrieval , 2016, ArXiv.