Query auto-completion (QAC) is one of the most recognizable and widely used services of modern search engines. Its goal is to assist a user in the process of query formulation. Current QAC systems are mainly reactive. They respond to the present request using past knowledge. Specifically, they mostly rely on query logs analysis or corpus terms co-occurrences and rank suggestions according to their similarity with the partial user query, their past popularity, or their temporal dynamics features (e.g. trends, bursts, seasonality in query popularity). Consequently, a suggestion to be recommended by the QAC system must be preceded with a substantial users' interest and ipso facto must be an old information. However, a growing amount of people turns to search engines to find novel information, that is emergent or recently created (not redundant) one. Conventional QAC systems are thus unable to fulfill the increasingly real-time needs of the users.
In this work-in-progress report, we introduce a new approach to QAC - the system filtering out potentially novel information and proactively delivering it to the users. It aims at providing the users with some novel insight. Thus, it caters for their open-ended or persistent and increasingly real-time information needs. The preliminary method proposed in this paper to evaluate this approach forms time specific suggestions based on a comparison of two corpora constantly being updated with new data from chosen sources. An unsupervised and language-independent algorithm relying on relative novelty of terms co-occurrences is used to generate suggestions. The initial experimental results demonstrate the effectiveness of the approach in recommending queries leading to novel information. Therefore, they prove that such a system can enhance the exploratory power of a search engine and support the proactive information search.
[1]
Aristides Gionis,et al.
Improving recommendation for long-tail queries via templates
,
2011,
WWW.
[2]
Milad Shokouhi,et al.
Time-sensitive query auto-completion
,
2012,
SIGIR '12.
[3]
J. Cornfield,et al.
A method of estimating comparative rates from clinical data; applications to cancer of the lung, breast, and cervix.
,
1951,
Journal of the National Cancer Institute.
[4]
M. Newman,et al.
Finding community structure in networks using the eigenvectors of matrices.
,
2006,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[5]
Frederick Mosteller,et al.
Association and Estimation in Contingency Tables
,
1968
.
[6]
Ricardo A. Baeza-Yates,et al.
Query Recommendation Using Query Logs in Search Engines
,
2004,
EDBT Workshops.
[7]
Francesco Bonchi,et al.
Query suggestions using query-flow graphs
,
2009,
WSCD '09.
[8]
George A. Miller,et al.
WordNet: A Lexical Database for English
,
1995,
HLT.
[9]
Jonghun Park,et al.
A scalable real-time search engine for fast retrieval of social media content
,
2011,
UbiCrowd '11.
[10]
Danah Boyd,et al.
Vizster: visualizing online social networks
,
2005,
IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..
[11]
Prasenjit Mitra,et al.
Query suggestions in the absence of query logs
,
2011,
SIGIR.
[12]
Weiyi Meng,et al.
mNIR: Diversifying Search Results Based on a Mixture of Novelty, Intention and Relevance
,
2012,
WISE.