Querying Streaming XML Big Data with Multiple Filters on Cloud

Nowadays, we enter a new era of data explosion which introduces the new problems for big data processing. Current methods for querying streaming XML big data are mostly based on events filtering techniques. It is well known that during the filtering, some data items have to be buffered before the filter can make the proper decision for adopting strategies to deal with them. Furthermore, for a single filter system, the buffer size often increases exponentially in the real application. Cloud is an ideal platform for big XML data processing with its massive storage and powerful computation capability. In this paper, we propose a new multi-filters strategy for querying streaming XML big data on Cloud. We show that the proposed multi-filters strategy can effectively share and reduce the filtering space and time consumption by fully exploit the scalability of Cloud. Furthermore, by deploying our multi-filters collaboration technique, the querying systems together can break the limitation of the theoretic concurrency lower bound. The empirical study shown in this paper demonstrates that our multi-filters strategy outperforms the single filter querying significantly.

[1]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[2]  Elke A. Rundensteiner,et al.  Automaton Meets Query Algebra: Towards a Unified Model for XQuery Evaluation over XML Data Streams , 2003, ER.

[3]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[4]  Marcus Fontoura,et al.  On the memory requirements of XPath evaluation over XML streams , 2004, PODS.

[5]  Jim Gray The next database revolution , 2004, SIGMOD '04.

[6]  Marcus Fontoura,et al.  Buffering in query evaluation over XML streams , 2005, PODS '05.

[7]  Marcus Fontoura,et al.  On the memory requirements of XPath evaluation over XML streams , 2007, J. Comput. Syst. Sci..

[8]  Xi He,et al.  Cloud Computing: a Perspective Study , 2010, New Generation Computing.

[9]  Jianxin Li,et al.  Semantics based Buffer Reduction for Queries over XML Data Streams , 2008, ADC.

[10]  Xiaoyu Yang,et al.  Recent Research Advances in e-Science , 2009, Cluster Computing.

[11]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[12]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[13]  Prashant J. Shenoy,et al.  A platform for scalable one-pass analytics using MapReduce , 2011, SIGMOD '11.

[14]  Daniel M. Batista,et al.  A Survey of Large Scale Data Management Approaches in Cloud Environments , 2011, IEEE Communications Surveys & Tutorials.

[15]  Keqiu Li,et al.  Big Data Processing in Cloud Computing Environments , 2012, 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks.

[16]  Yi Liang,et al.  In Cloud, Can Scientific Communities Benefit from the Economies of Scale? , 2010, IEEE Transactions on Parallel and Distributed Systems.

[17]  Nesime Tatbul,et al.  Stream as You Go: The Case for Incremental Data Access and Processing in the Cloud , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[18]  Zhang,et al.  SPBD: Streamlining Big-Data Processing in Cloud Environments , 2013 .