A Deep Investigation of Deep IR Models

Œe e‚ective of information retrieval (IR) systems have become more important than ever. Deep IR models have gained increasing aŠention for its ability to automatically learning features from raw text; thus, many deep IR models have been proposed recently. However, the learning process of these deep IR models resemble a black box. Œerefore, it is necessary to identify the di‚erence between automatically learned features by deep IR models and hand-cra‰ed features used in traditional learning to rank approaches. Furthermore, it is valuable to investigate the di‚erences between these deep IR models. Œis paper aims to conduct a deep investigation on deep IR models. Speci€cally, we conduct an extensive empirical study on two di‚erent datasets, including Robust and LETOR4.0. We €rst compared the automatically learned features and handcra‰ed features on the respects of query term coverage, document length, embeddings and robustness. It reveals a number of disadvantages compared with hand-cra‰ed features. Œerefore, we establish guidelines for improving existing deep IR models. Furthermore, we compare two di‚erent categories of deep IR models, i.e. representation-focused models and interaction-focused models. It is shown that two types of deep IR models focus on di‚erent categories of words, including topic-related words and query-related words.