Mars demonstration exploits the microblogs location information to support a wide variety of important spatio-temporal queries on microblogs. Supported queries include range, nearest-neighbor, and aggregate queries. Mars works under a challenging environment where streams of microblogs are arriving with high arrival rates. Mars distinguishes itself with three novel contributions: (1) Efficient in-memory digestion/expiration techniques that can handle microblogs of high arrival rates up to 64,000 microblog/sec. This also includes highly accurate and efficient hopping-window based aggregation for incoming microblogs keywords. (2) Smart memory optimization and load shedding techniques that adjust in-memory contents based on the expected query load to trade off a significant storage savings with a slight and bounded accuracy loss. (3) Scalable real-time query processing, exploiting Zipf distributed microblogs data for efficient top-k aggregate query processing. In addition, Mars employs a scalable real-time nearest neighbor and range query processing module that employs various pruning techniques so that it serves heavy query workloads in real time. Mars is demonstrated using a stream of real tweets obtained from Twitter firehose with a production query workload obtained from Bing web search. We show that Mars serves incoming queries with an average latency of less than 4 msec and with 99% answer accuracy while saving up to 70% of storage overhead for different query loads.
[1]
Beng Chin Ooi,et al.
TI: an efficient indexing mechanism for real-time search on tweets
,
2011,
SIGMOD '11.
[2]
Chuang Liu,et al.
The Unified Logging Infrastructure for Data Analytics at Twitter
,
2012,
Proc. VLDB Endow..
[3]
Michael S. Bernstein,et al.
Twitinfo: aggregating and visualizing microblogs for event exploration
,
2011,
CHI.
[4]
Kazufumi Watanabe,et al.
Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs
,
2011,
CIKM '11.
[5]
Nick Koudas,et al.
TwitterMonitor: trend detection over the twitter stream
,
2010,
SIGMOD Conference.
[6]
Suman Nath,et al.
Mercury: A memory-constrained spatio-temporal real-time search on microblogs
,
2014,
2014 IEEE 30th International Conference on Data Engineering.
[7]
Rajeev Motwani,et al.
Approximate Frequency Counts over Data Streams
,
2012,
VLDB.
[8]
Hanan Samet,et al.
TwitterStand: news in tweets
,
2009,
GIS.
[9]
MaddenSamuel,et al.
Processing and visualizing the data in tweets
,
2012
.
[10]
Walid G. Aref,et al.
Efficient processing of window queries in the pyramid data structure
,
1990,
PODS '90.
[11]
Jimmy J. Lin,et al.
Earlybird: Real-Time Search at Twitter
,
2012,
2012 IEEE 28th International Conference on Data Engineering.