Efficient Query Processing Platform for Uncertain Big Data

Query processing technology has recently received a lot of attention in the business intelligence and information service communities. However, the existing approaches can not efficiently optimize the query performance in the uncertain big data environment. In this paper, we propose QPPUBG, a novel and efficient query processing platform for uncertain big data. QPPUBG mainly includes four modules: (i) query equivalence reconstructing for uncertain big data; (ii) multiple query optimization over probability relation components; (iii) query execution plan constructing over probability relation components, and (iv) physical implementation solution of query for uncertain big data. Specially, QPPUBG can support the possible world instance semantics and efficiently handle arbitrary decision spaces. Moreover, QPPUBG can seamlessly integrate the above four modules into the modern parallel computation frameworks. We present the extensive experiments that demonstrate QPPUBG is both efficient and effective.

[1]  Amin Shokrollahi,et al.  New model for rigorous analysis of LT-codes , 2006, 2006 IEEE International Symposium on Information Theory.

[2]  Craig Lee,et al.  Detecting future social unrest in unprocessed Twitter data: “Emerging phenomena and big data” , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[3]  Nicola Nicolici,et al.  NoC-Based FPGA Acceleration for Monte Carlo Simulations with Applications to SPECT Imaging , 2013, IEEE Transactions on Computers.

[4]  Xianpeng Wang,et al.  A Robust Anti-Jamming Navigation Receiver with Antenna Array and GPS/SINS , 2014, IEEE Communications Letters.

[5]  M. Fernandez-Redondo,et al.  Interval arithmetic inversion: a new rule extraction algorithm , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[6]  Wouter Joosen,et al.  Policy Analysis Using a Hybrid Semantic Reasoning Engine , 2007, Eighth IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'07).

[7]  José T. de Sousa,et al.  Decision heuristic for Davis Putnam, Loveland and Logemann algorithm satisfiability solving based on cube subtraction , 2008, IET Comput. Digit. Tech..

[8]  Nascif A. Abousalh-Neto,et al.  Big data exploration through visual analytics , 2012, IEEE VAST.

[9]  Eamonn J. Keogh,et al.  Clustering of Symbols Using Minimal Description Length , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[10]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[11]  Gavin M. Bierman,et al.  Processing Declarative Queries through Generating Imperative Code in Managed Runtimes , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[12]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[13]  Reynold Cheng,et al.  Mining uncertain data with probabilistic guarantees , 2010, KDD.

[14]  James A. Bucklew Conditional importance sampling estimators , 2005, IEEE Transactions on Information Theory.

[15]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[16]  Munther A. Dahleh,et al.  The Value of Side Information in Shortest Path Optimization , 2011, IEEE Transactions on Automatic Control.

[17]  Jorge-Arnulfo Quiané-Ruiz,et al.  Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[18]  Zibin Zheng,et al.  Service-Generated Big Data and Big Data-as-a-Service: An Overview , 2013, 2013 IEEE International Congress on Big Data.

[19]  Yonggang Wu,et al.  An Improvement of Index Method and Structure Based on R-Tree , 2008, 2008 International Conference on Computer Science and Software Engineering.

[20]  Paul D. Seymour,et al.  Analyzing the Performance of Greedy Maximal Scheduling via Local Pooling and Graph Theory , 2010, 2010 Proceedings IEEE INFOCOM.