Data Streams Processing Techniques

Many modern applications in several domains such as sensor networks, financial applications, web logs and click-streams operate on continuous, unbounded, rapid, time-varying streams of data elements. These applications present new challenges that are not addressed by traditional data management techniques. For the query processing of continuous data streams, we consider in particular continuous queries which are evaluated continuously as data streams continue to arrive. The answer to a continuous query is produced over time, always reflecting the stream data seen so far. One of the most critical requirements of stream processing is fast processing. So, parallel and distributed processing would be good solutions. This paper gives (i) Analysis to the different continuous query processing techniques. (ii) A comparative study for the data streams execution environments. (iii) Finally, we propose an integrated system for processing data streams based on cloud computing which apply continuous query optimization technique on cloud environment.

[1]  Hyung-Ju Cho Continuous range k-nearest neighbor queries in vehicular ad hoc networks , 2013, J. Syst. Softw..

[2]  Fernando Guirado,et al.  Enhancing throughput for streaming applications running on cluster systems , 2013, J. Parallel Distributed Comput..

[3]  Jianliang Xu,et al.  Range-Based Skyline Queries in Mobile Environments , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Patrick Valduriez,et al.  StreamCloud: A Large Scale Data Streaming System , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[5]  Tao Chen,et al.  Optimizing Multi-Top-k Queries over Uncertain Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Matthew O. Ward,et al.  Mining neighbor-based patterns in data streams , 2013, Inf. Syst..

[7]  Theodore Johnson,et al.  Scalable Scheduling of Updates in Streaming Data Warehouses , 2012, IEEE Transactions on Knowledge and Data Engineering.

[8]  HaRim Jung,et al.  QR-tree: An efficient and scalable method for evaluation of continuous range queries , 2014, Inf. Sci..

[9]  Chu-Sing Yang,et al.  A High Performance Load Balance Strategy for Real-Time Multicore Systems , 2014, TheScientificWorldJournal.

[10]  Jianliang Xu,et al.  Authenticating Location-Based Skyline Queries in Arbitrary Subspaces , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Omran Saleh,et al.  Monitoring and Autoscaling IaaS Clouds: A Case for Complex Event Processing on Data Streams , 2013, 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing.

[12]  Wen Zhang,et al.  Dynamic Control of Data Streaming and Processing in a Virtualized Environment , 2012, IEEE Transactions on Automation Science and Engineering.

[13]  Krithi Ramamritham,et al.  Category-Based Infidelity Bounded Queries over Unstructured Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Yuzhe Tang,et al.  Autopipelining for Data Stream Processing , 2013, IEEE Transactions on Parallel and Distributed Systems.

[15]  Odysseas Papapetrou,et al.  Sketch-based Querying of Distributed Sliding-Window Data Streams , 2012, Proc. VLDB Endow..

[16]  Chi-Yin Chow,et al.  SMashQ: spatial mashup framework for k-NN queries in time-dependent road networks , 2012, Distributed and Parallel Databases.

[17]  Lu Liu,et al.  Muppet: MapReduce-Style Processing of Fast Data , 2012, Proc. VLDB Endow..

[18]  Jianmin Wang,et al.  Aggregate nearest neighbor queries in uncertain graphs , 2013, World Wide Web.

[19]  Xiaosong Wang,et al.  VGQ-Vor: extending virtual grid quadtree with Voronoi diagram for mobile k nearest neighbor queries over mobile objects , 2013, Frontiers of Computer Science.

[20]  Myoung-Ho Kim,et al.  Efficient processing of multiple continuous skyline queries over a data stream , 2013, Inf. Sci..

[21]  Hans-Arno Jacobsen,et al.  Multi-query Stream Processing on FPGAs , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[22]  Zhangbing Zhou,et al.  An energy efficient hierarchical clustering index tree for facilitating time-correlated region queries in the Internet of Things , 2014, J. Netw. Comput. Appl..

[23]  Hui Chen Mining top-k frequent patterns over data streams sliding window , 2013, Journal of Intelligent Information Systems.

[24]  Yon Dohn Chung,et al.  Processing generalized k-nearest neighbor queries on a wireless broadcast stream , 2012, Inf. Sci..

[25]  Hyeon Gyu Kim A Structure for Sliding Window Equijoins in Data Stream Processing , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[26]  Yuan-Ko Huang,et al.  Efficient processing of continuous min-max distance bounded query with updates in road networks , 2014, Inf. Sci..

[27]  Shivnath Babu,et al.  Execution and optimization of continuous queries with cyclops , 2013, SIGMOD '13.

[28]  Yaping Lin,et al.  Energy-efficient filtering for skyline queries in cluster-based sensor networks , 2014, Comput. Electr. Eng..

[29]  G. Sandhya,et al.  An adaptive sliding window based continuous Top-K dominating queries , 2013, 2013 7th International Conference on Intelligent Systems and Control (ISCO).

[30]  Yuan-Ko Huang,et al.  Continuous distance-based skyline queries in road networks , 2012, Inf. Syst..

[31]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[32]  Yon Dohn Chung,et al.  View field nearest neighbor: A novel type of spatial queries , 2014, Inf. Sci..

[33]  Weifa Liang,et al.  Energy-efficient top-k query evaluation and maintenance in wireless sensor networks , 2014, Wirel. Networks.

[34]  Vincenzo Gulisano,et al.  StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. (StreamCloud: un moteur de traitement de streams parallèle et distribué) , 2013 .

[35]  Rodrigo Fonseca,et al.  Managing parallelism for stream processing in the cloud , 2012, HotCDP '12.

[36]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[37]  Alexandre M. Bayen,et al.  Large-Scale Estimation in Cyberphysical Systems Using Streaming Data: A Case Study With Arterial Traffic Estimation , 2013, IEEE Transactions on Automation Science and Engineering.

[38]  Jingfei Jiang,et al.  Efficient Resources Provisioning Based on Load Forecasting in Cloud , 2014, TheScientificWorldJournal.

[39]  Elisa Bertino,et al.  Multi-route query processing and optimization , 2013, J. Comput. Syst. Sci..

[40]  D. Janaki Ram,et al.  Optimizing Ordered Throughput Using Autonomic Cloud Bursting Schedulers , 2013, IEEE Transactions on Software Engineering.

[41]  Reynold Cheng,et al.  Probabilistic filters: A stream protocol for continuous probabilistic queries , 2013, Inf. Syst..

[42]  Thomas S. Heinze,et al.  Auto-scaling techniques for elastic data stream processing , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[43]  Daniel Kuhn,et al.  SQPR: Stream query planning with reuse , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[44]  Sherif Sakr,et al.  Modeling performance of a parallel streaming engine: bridging theory and costs , 2013, ICPE '13.

[45]  Yongli Wang,et al.  An embedded co-processor for accelerating window joins over uncertain data streams , 2012, Microprocess. Microsystems.

[46]  Hicham G. Elmongui,et al.  Continuous aggregate nearest neighbor queries , 2011, GeoInformatica.

[47]  K. Selçuk Candan,et al.  Layered processing of skyline-window-join (SWJ) queries using iteration-fabric , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[48]  Elke A. Rundensteiner,et al.  Optimizing adaptive multi-route query processing via time-partitioned indices , 2013, J. Comput. Syst. Sci..

[49]  Won Suk Lee,et al.  Erratum to "Adaptive optimization for multiple continuous queries" [Data Knowl. Eng. 71(2012) 29-46] , 2012, Data Knowl. Eng..

[50]  Won Suk Lee,et al.  Adaptive optimization for multiple continuous queries , 2012, Data Knowl. Eng..

[51]  Roberto Baldoni,et al.  Virtual Tree: A robust architecture for interval valid queries in dynamic distributed systems , 2013, J. Parallel Distributed Comput..

[52]  Lu Shan,et al.  Research and Improvement of Load Balancing Algorithm in Distributed Sonar Data Stream Management System , 2012, 2012 Ninth Web Information Systems and Applications Conference.

[53]  Sang Hyuk Son,et al.  Modeling and Analyzing Real-Time Data Streams , 2011, 2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing.

[54]  Athanasios V. Vasilakos,et al.  An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications , 2014, IEEE Transactions on Network and Service Management.

[55]  Indrakshi Ray,et al.  Information flow control for stream processing in clouds , 2013, SACMAT '13.

[56]  Gang Chen,et al.  Processing k-skyband, constrained skyline, and group-by skyline queries on incomplete data , 2014, Expert Syst. Appl..

[57]  Lei Chen,et al.  Continuous monitoring of skylines over uncertain data streams , 2012, Inf. Sci..

[58]  Mohsen Sharifi,et al.  Dynamic routing of data stream tuples among parallel query plan running on multi-core processors , 2012, Distributed and Parallel Databases.

[59]  Peter R. Pietzuch,et al.  Adaptive Provisioning of Stream Processing Systems in the Cloud , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[60]  Elke A. Rundensteiner,et al.  Robust distributed stream processing , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[61]  Arbee L. P. Chen,et al.  Top-n query processing in spatial databases considering bi-chromatic reverse k-nearest neighbors , 2014, Inf. Syst..

[62]  Cong Zhang,et al.  Probabilistic nearest neighbor queries of uncertain data via wireless data broadcast , 2013, Peer-to-Peer Netw. Appl..

[63]  Hua Lu,et al.  Efficient and scalable continuous skyline monitoring in two-tier streaming settings , 2013, Inf. Syst..

[64]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[65]  Zhang Lijie,et al.  Query Plan Optimization and Migration Strategy over Data Stream , 2010, 2010 International Forum on Information Technology and Applications.

[66]  Indrakshi Ray,et al.  Query Plan Execution in a Heterogeneous Stream Management System for Situational Awareness , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[67]  Dimitrios Gunopulos,et al.  Supporting historic queries in sensor networks with flash storage , 2014, Inf. Syst..

[68]  Kostas Katrinis,et al.  Generating synthetic task graphs for simulating stream computing systems , 2013, J. Parallel Distributed Comput..

[69]  Mostafa S. Haghjoo,et al.  Parallel processing of continuous queries over data streams , 2010, Distributed and Parallel Databases.

[70]  Emmanuelle Anceaume,et al.  A Distributed Information Divergence Estimation over Data Streams , 2014, IEEE Transactions on Parallel and Distributed Systems.

[71]  Walid G. Aref,et al.  M3: Stream Processing on Main-Memory MapReduce , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[72]  Elke A. Rundensteiner,et al.  Semantic stream query optimization exploiting dynamic metadata , 2011, 2011 IEEE 27th International Conference on Data Engineering.