External Expansion Risk Management: Enhancing Microblogging Filtering Using Implicit Query

Microblogging filtering can help users filter out irrelevant content, and extract timely content effectively from microblogs. However, as a typical short text, microblogging filtering suffers from the insufficient samples problem that makes the probabilistic-like models unreliable. According to the current research, an explicit brief query has been thought to be only an abstract of the user’s information needs, and it’s hard to infer what is the users’ actual searching intents. Instead, we submit the relevant external documents as a user’s implicit prior knowledge and then build a corresponding filtering framework. To against the risk of external documents expansion, we suppose the external document can be viewed as a complete statement of an explicit query, and encode the filtering preferences with the diverge degree between the external document and the the original explicit query. Thus the optimal filtering action is the one that allows one to trade off diverge degree against generalization performance. With respect to the established baselines, our algorithm yields compelling results for providing a meaningful tweets retrieval. This work helps further understand the innate risk characteristics of external expansion for the design of Microblogging filtering systems.

[1]  Zhen Yang,et al.  Sensational Headline Identification By Normalized Cross Entropy-Based Metric , 2015, Comput. J..

[2]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[3]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jianfeng Gao,et al.  Information Retrieval for short documents , 2006 .

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[7]  Katrina Fenlon,et al.  Improving retrieval of short texts through document expansion , 2012, SIGIR '12.

[8]  Honggang Zhang,et al.  Variational Bayesian Matrix Factorization for Bounded Support Data , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kazuhiro Seki,et al.  TREC 2011 Microblog Track Experiments at Kobe University , 2012, TREC.

[10]  Zhen Yang,et al.  Decorrelation of Neutral Vector Variables: Theory and Applications , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[12]  Jian Huang,et al.  Exploiting Multi-Sources Query Expansion in Microblogging Filtering , 2017 .

[13]  John D. Lafferty,et al.  Two-stage language models for information retrieval , 2002, SIGIR '02.

[14]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[15]  Zhen Yang,et al.  Finding the right social media site for questions , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).