A Real-Time Distributed Index Based on Topic for Microblogging System

With the development of internet technology and widely used in mobile devices, the microblogging systems such as Twitter and Sina Weibo in China have become the most important platform for people to retrieve information and communicate with each other. The real-time search became a big challenge for microblogging systems because of the volume of data and users. Existing approaches build all microblogs in an index which will increase the cost of index update and query. The search results could not satisfy users’ timely and high quality requirements. In this paper, we propose a new real-time distributed index based on topic (RDIBT), which can build index for each topic. Those topical indices will be distributed to many sites, so it can improve the concurrently of queries. Extensive experiments demonstrate the effectiveness and efficiency of RDIBT on the real dataset.

[1]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[4]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Jon M. Kleinberg,et al.  Applications of linear algebra in information retrieval and hypertext analysis , 1999, PODS '99.

[7]  Ming Gao,et al.  Real-time and Personalized Search over a Microblogging System , 2014, Comput. J..

[8]  Serge Abiteboul,et al.  Queries and computation on the web , 1997, Theor. Comput. Sci..

[9]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[10]  Filippo Menczer,et al.  Crawling the Web , 2004, Web Dynamics.

[11]  Xiaokui Xiao,et al.  LSII: An indexing structure for exact real-time search on microblogs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Yue Lu,et al.  Opinion integration through semi-supervised topic modeling , 2008, WWW.

[14]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[15]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[16]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[17]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[18]  Beng Chin Ooi,et al.  TI: an efficient indexing mechanism for real-time search on tweets , 2011, SIGMOD '11.

[19]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[20]  Ming Gao,et al.  Real-Time Search over a Microblogging System , 2012, 2012 Second International Conference on Cloud and Green Computing.