Querying big data from a database perspective

Querying big data is the cornerstone of the application of big data. From a database perspective, query is the function defined by domain, range and a specified semantics. For querying big data, the domain is the big data and becomes quite complicated with characteristics of large volume, heterogeneous types, strong timeliness, weak authenticity, etc. To describe and analyze querying big data theoretically, we propose the definitions of big data and big data system, which includes querying big data. Aforementioned characteristics of big data lead to a break-through in normal form qualification and Closed World Assumption (CWA) related to traditional database. Therefore, this paper also points out several challenges according to those characteristics of domain and analyzes them in detail using the first-order language. Since the complexity of domain makes traditionally tractable queries infeasible, this paper analyzes and summarizes the classification of queries on relational big data according to their structure and computational complexity.

[1]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[2]  Ping Lu,et al.  Querying Big Data by Accessing Small Data , 2015, PODS.

[3]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[4]  Hailin Zou,et al.  Research of RST Model Based on CWA and OWA , 2009, 2009 International Workshop on Intelligent Systems and Applications.

[5]  Nidhi Madia,et al.  Information extraction from unstructured data using RDF , 2016, 2016 International Conference on ICT in Business Industry & Government (ICTBIG).

[6]  Frank Neven,et al.  Making Queries Tractable on Big Data with Preprocessing , 2013, Proc. VLDB Endow..

[7]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[8]  Gabriele Bavota,et al.  Too Long; Didn't Watch! Extracting Relevant Fragments from Software Development Video Tutorials , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[9]  Tianyu Wo,et al.  Bounded Conjunctive Queries , 2014, Proc. VLDB Endow..

[10]  Wenfei Fan,et al.  On scale independence for querying big data , 2014, PODS.

[11]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[12]  Tim Kraska,et al.  PIQL: Success-Tolerant Query Processing in the Cloud , 2011, Proc. VLDB Endow..

[13]  Wenfei Fan,et al.  Querying Big Data: Bridging Theory and Practice , 2014, Journal of Computer Science and Technology.

[14]  David A. Patterson,et al.  SCADS: Scale-Independent Storage for Social Computing Applications , 2009, CIDR.

[15]  H. James Hoover,et al.  Limits to Parallel Computation: P-Completeness Theory , 1995 .

[16]  David Harel,et al.  Structure and Complexity of Relational Queries , 1980, FOCS.

[17]  Tianyu Wo,et al.  Strong simulation , 2014, ACM Trans. Database Syst..

[18]  Timos K. Sellis Personalization in Web Search and Data Management , 2012, SEBD.

[19]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  Tim Kraska,et al.  Generalized scale independence through incremental precomputation , 2013, SIGMOD '13.