Simplified data posting in practice

The data posting framework introduced in [8] adapts the well-known Data Exchange techniques to the new Big Data management and analysis challenges that can be found in real world scenarios. Although it is expressive enough, it requires the ability of using count constraints and may be difficult for a non expert user. Moreover, the data posting problem is NP-complete under the data complexity in the general case, then the use of the non-deterministic variables is performed. Indeed, identifying the conditions that guarantee polynomial-time execution in the presence of non-deterministic choices is very important for practical purposes. In this paper, we present a simplified version of data posting framework, based on the use of the smart mapping rules, that integrate the simple mapping description with some parameters, avoiding the complex specifications with count constraints. We show that the data posting problem in the new setting is NP- complete and identify the conditions under which this problem becomes polynomial even in the presence of non-deterministic choices.

[1]  Elio Masciari,et al.  Discovering User Behavioral Features to Enhance Information Search on Big Data , 2017, ACM Trans. Interact. Intell. Syst..

[2]  Adrian Onet,et al.  The Chase Procedure and its Applications in Data Exchange , 2013, Data Exchange, Information, and Streams.

[3]  Ronald Fagin,et al.  Locally consistent transformations and query answering in data exchange , 2004, PODS '04.

[4]  Sergio Greco,et al.  Checking Chase Termination: Cyclicity Analysis and Rewriting Techniques , 2015, IEEE Transactions on Knowledge and Data Engineering.

[5]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[6]  Ester Zumpano,et al.  Computing a Deterministic Semantics for P2P Deductive Databases , 2017, IDEAS.

[7]  Sergio Greco,et al.  Stratification criteria and rewriting techniques for checking chase termination , 2011, Proc. VLDB Endow..

[8]  Sergio Greco,et al.  ACID: A System for Computing Approximate Certain Query Answers over Incomplete Databases , 2018, SIGMOD Conference.

[9]  Sergio Greco,et al.  Approximation algorithms for querying incomplete databases , 2019, Inf. Syst..

[10]  Georg Lausen,et al.  On Chase Termination Beyond Stratification , 2009, Proc. VLDB Endow..

[11]  Ester Zumpano,et al.  Aggregates and priorities in P2P data management systems , 2011, IDEAS '11.

[12]  Irina Trubitsyna,et al.  ChaseT: A Tool For Checking Chase Termination , 2011, SEBD.

[13]  Sergio Greco,et al.  Computing Approximate Query Answers over Inconsistent Knowledge Bases , 2018, IJCAI.

[14]  Sergio Greco,et al.  Optimization of bound disjunctive queries with constraints , 2004, Theory and Practice of Logic Programming.

[15]  Domenico Saccà,et al.  Data Exchange in Datalog Is Mainly a Matter of Choice , 2012, Datalog.

[16]  Leonid Libkin,et al.  Incomplete data: what went wrong, and how to fix it , 2014, PODS.

[17]  Elio Masciari Trajectory Clustering via Effective Partitioning , 2009, FQAS.

[18]  Elio Masciari,et al.  A framework for adaptive mail classification , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[19]  Sergio Greco,et al.  atalog: A logic language for expressing search and optimization problems , 2009, Theory and Practice of Logic Programming.

[20]  Elio Masciari,et al.  Efficient and effective RFID data warehousing , 2009, IDEAS '09.

[21]  Sergio Greco,et al.  Incomplete Data and Data Dependencies in Relational Databases , 2012, Incomplete Data and Data Dependencies in Relational Databases.

[22]  Domenico Saccà,et al.  Count Constraints and the Inverse OLAP Problem: Definition, Complexity and a Step toward Aggregate Data Exchange , 2012, FoIKS.

[23]  Carlo Zaniolo,et al.  Analysing microarray expression data through effective clustering , 2014, Inf. Sci..

[24]  Irina Trubitsyna,et al.  Simple User Assistance by Data Posting , 2019, 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE).

[25]  Irina Trubitsyna,et al.  A framework for prioritized reasoning based on the choice evaluation , 2007, SAC '07.

[26]  Marco Calautti,et al.  Rewriting-based Check of Chase Termination , 2015, AMW.

[27]  Sergio Greco,et al.  On the Semantics of Logic Programs with Preferences , 2007, J. Artif. Intell. Res..

[28]  Marco Calautti,et al.  Exploiting Equality Generating Dependencies in Checking Chase Termination , 2016, Proc. VLDB Endow..

[29]  Alin Deutsch,et al.  The chase revisited , 2008, PODS.

[30]  Thomas Lukasiewicz,et al.  Complexity of Approximate Query Answering under Inconsistency in Datalog+/- , 2018, SEBD.