On XML integrity constraints in the presence of DTDs

The paper investigates XML document specifications with DTDs and integrity constraints, such as keys and foreign keys. We study the consistency problem of checking whether a given specification is meaningful: that is, whether there exists an XML document that both conforms to the DTD and satisfies the constraints. We show that DTDs interact with constraints in a highly intricate way and as a result, the consistency problem in general is undecidable. When it comes to unary keys and foreign keys, the consistency problem is shown to be NP-complete. This is done by coding DTDs and integrity constraints with linear constraints on the integers. We consider the variations of the problem (by both restricting and enlarging the class of constraints), and identify a number of tractable cases, as well as a number of additional NP-complete ones. By incorporating negations of constraints, we establish complexity bounds on the implication problem, which is shown to be coNP-complete for unary keys and foreign keys.

[1]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[2]  DataWenfei FanTemple Universityfan,et al.  Finite Implication of Keys and Foreign Keys for XML , 2000 .

[3]  Grant E. Weddell,et al.  Reasoning About Equations and Functional Dependencies on Complex Objects , 1994, IEEE Trans. Knowl. Data Eng..

[4]  Minoru Ito,et al.  Implication Problems for Functional Constraints on Databases Supporting Complex Objects , 1994, J. Comput. Syst. Sci..

[5]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[6]  Chaitanya K. Baru,et al.  XML-based information mediation with MIX , 1999, SIGMOD '99.

[7]  Alberto O. Mendelzon,et al.  Research Issues in Structured and Semistructured Database Programming , 1999, Lecture Notes in Computer Science.

[8]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Catriel Beeri,et al.  Schemas for Integration and Translation of Structured and Semi-structured Data , 1999, ICDT.

[11]  David Schach,et al.  XML Query Language (XQL) , 1998, QL.

[12]  Paris C. Kanellakis On the Computational Complexity of Cardinality Constraints in Relational Databases , 1980, Inf. Process. Lett..

[13]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[14]  Y VardiMoshe,et al.  Polynomial-time implication problems for unary inclusion dependencies , 1990 .

[15]  Rajshekhar Sunderraman,et al.  XML - Data를 이용한 웹 질의처리 , 2000 .

[16]  Dan Suciu,et al.  Verifying Integrity Constraints on Web Sites , 1999, IJCAI.

[17]  Wenfei Fan,et al.  Integrity constraints for XML , 2000, PODS.

[18]  Wenfei Fan,et al.  Interaction between path and type constraints , 1999, PODS '99.

[19]  Wenfei Fan,et al.  Path constraints on semistructured and structured data , 1998, PODS '98.

[20]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[21]  Carmem S. Hara,et al.  Reasoning about nested functional dependencies , 1999, PODS '99.

[22]  Christos H. Papadimitriou,et al.  Elements of the Theory of Computation , 1997, SIGA.

[23]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, J. Comput. Syst. Sci..

[24]  Abel,et al.  A formal semantics of patterns in XSLT , 2000 .

[25]  Diego Calvanese,et al.  Representing and Reasoning on XML Documents: A Description Logic Approach , 1999, J. Log. Comput..

[26]  Jonathan Robie,et al.  Document Object Model (DOM) Level 2 Specification , 1998 .

[27]  Jennifer Widom Data Management for XML: Research Directions , 1999, IEEE Data Eng. Bull..

[28]  Moshe Y. Vardi,et al.  Polynomial-time implication problems for unary inclusion dependencies , 1990, JACM.

[29]  Hendrik W. Lenstra,et al.  Integer Programming with a Fixed Number of Variables , 1983, Math. Oper. Res..

[30]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[31]  Wenfei Fan,et al.  Path Constraints in Semistructured Databases , 2000, J. Comput. Syst. Sci..

[32]  Wenfei Fan,et al.  Finite Satisfiability of Keys and Foreign Keys for XML Data , 2000 .

[33]  Patrick Valduriez,et al.  A Methodology for Query Reformulation in CIS Using Semantic Knowledge , 1996, Int. J. Cooperative Inf. Syst..

[34]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[35]  B. Dreben,et al.  The decision problem: Solvable classes of quantificational formulas , 1979 .

[36]  Wenfei Fan,et al.  Reasoning about Keys for XML , 2001, DBPL.

[37]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[38]  Christos H. Papadimitriou,et al.  On the complexity of integer programming , 1981, JACM.

[39]  Val Tannen,et al.  Object/relational query optimization with chase and backchase , 2000 .

[40]  Frank Neven,et al.  Extensions of Attribute Grammars for Structured Document Queries , 1999, DBPL.

[41]  Wenfei Fan,et al.  Query Optimization for Semistructured Data Using Path Constraints in a Deterministic Data Model , 1999, DBPL.

[42]  Diego Calvanese,et al.  Making object-oriented schemas more expressive , 1994, PODS '94.

[43]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[44]  Diego Calvanese,et al.  On the interaction between ISA and cardinality constraints , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[45]  Alan R. Simon,et al.  Understanding the New SQL: A Complete Guide , 1993 .