Beyond Bags of Words: Modeling Implicit User Preferences in Information Retrieval

This paper reports on recent work in the field of information retrieval that attempts to go beyond the overly simplified approach of representing documents and queries as bags of words. Simple models make it difficult to accurately model a user's information need. The model presented in the paper is based on Markov random fields and allows almost arbitrary features to be encoded. This provides a powerful mechanism for modeling many of the implicit constraints a user has in mind when formulating a query. Simple instantiations of the model that consider dependencies between the terms in a query have shown to significantly outperform bag of words models. Further extensions of the model are possible to incorporate even more complex constraints based other domain knowledge. Finally, we describe what place our model has within the broader realm of artificial intelligence and propose several open questions that may be of general interest to the field.