Real-time social media retrieval with spatial, temporal and social constraints

Search in social network is continuously being expanded to enhance user experience. Besides basic textual retrieval, users can also search based on features such as spatial proximity, temporal freshness and/or social closeness. To efficiently process each advanced query type, customized indexing mechanisms have been developed. However, such mechanisms only perform well for the query types that they were designed for; moreover, they are not readily adaptable to support other query types. In this paper, we propose an interval-at-a-time (IAAT) framework as a first attempt to provide a one-size-fits-all solution to social media retrieval with spatial, temporal and social constraints. In addition, the algorithm relies on inverted index only, which makes it compatible with conventional search engines. The inverted lists are sorted by document id and the insertion is very fast because only append operation is involved. Experiments conducted on two large-scale Twitter datasets show that though IAAT is a unified strategy, it performs better than most of the state-of-the-art customized solutions in a variety of query types.

[1]  Alistair Moffat,et al.  Fast ranking in limited space , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[2]  Kian-Lee Tan,et al.  Real time personalized search on social networks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[3]  Nicu Sebe,et al.  Optimal graph learning with partial tags and multiple features for image and video annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cristian Rossi,et al.  Fast document-at-a-time query processing using two-tier indexes , 2013, SIGIR.

[5]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[6]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.

[7]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[8]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..

[9]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[10]  W. Bruce Croft,et al.  Efficient document retrieval in main memory , 2007, SIGIR.

[11]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[12]  Beng Chin Ooi,et al.  TI: an efficient indexing mechanism for real-time search on tweets , 2011, SIGMOD '11.

[13]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Sai Ji,et al.  Towards efficient content-aware search over encrypted outsourced data in cloud , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[15]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[16]  Shichao Zhang,et al.  Self-representation nearest neighbor search for classification , 2016, Neurocomputing.

[17]  Fabrizio Silvestri,et al.  VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming , 2010, CIKM.

[18]  Damon Horowitz,et al.  The anatomy of a large-scale social search engine , 2010, WWW '10.

[19]  Alistair Moffat,et al.  Pruned query evaluation using pre-computed impacts , 2006, SIGIR.

[20]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[21]  Ashish Goel,et al.  Partitioned multi-indexing: bringing order to social search , 2012, WWW.

[22]  Kian-Lee Tan,et al.  Processing spatial keyword query as a top-k aggregation query , 2014, SIGIR.

[23]  Marcus Fontoura,et al.  Evaluation strategies for top-k queries over memory-resident inverted indexes , 2011, Proc. VLDB Endow..

[24]  Michael Persin,et al.  Document filtering for fast ranking , 1994, SIGIR '94.

[25]  João B. Rocha-Junior,et al.  Efficient Processing of Top-k Spatial Keyword Queries , 2011, SSTD.

[26]  Xingming Sun,et al.  Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement , 2016, IEEE Transactions on Information Forensics and Security.

[27]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[28]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[29]  Christian S. Jensen,et al.  Spatial Keyword Query Processing: An Experimental Evaluation , 2013, Proc. VLDB Endow..

[30]  Anthony K. H. Tung,et al.  Scalable top-k spatial keyword search , 2013, EDBT '13.

[31]  Andrew Trotman,et al.  Compressing Inverted Files , 2004, Information Retrieval.

[32]  Chao Yang,et al.  Unicorn: A System for Searching the Social Graph , 2013, Proc. VLDB Endow..

[33]  Nicu Sebe,et al.  Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation , 2016, IEEE Transactions on Image Processing.

[34]  Nicu Sebe,et al.  Graph-without-cut: An Ideal Graph Learning for Image Segmentation , 2016, AAAI.

[35]  Sreenivas Gollapudi,et al.  A sketch-based distance oracle for web-scale graphs , 2010, WSDM '10.

[36]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[37]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[38]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[39]  Raymond Chi-Wing Wong,et al.  Exact Top-k Nearest Keyword Search in Large Networks , 2015, SIGMOD Conference.

[40]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[41]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[42]  Jimmy J. Lin,et al.  Fast candidate generation for real-time tweet search with bloom filter chains , 2013, TOIS.

[43]  Jingkuan Song,et al.  Learning in high-dimensional multimedia data: the state of the art , 2015, Multimedia Systems.

[44]  Alistair Moffat,et al.  Simplified similarity scoring using term ranks , 2005, SIGIR '05.

[45]  Xiaokui Xiao,et al.  LSII: An indexing structure for exact real-time search on microblogs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).