Finding ID Attributes in XML Documents

We consider the problem of discovering candidate ID and IDREF attributes in a schemaless XML document. We characterize the complexity of the problem, propose a heuristic algorithm for it, and discuss experimental results.

[1]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[2]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[3]  Wenfei Fan,et al.  On verifying consistency of XML specifications , 2002, PODS.

[4]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  Gösta Grahne,et al.  Discovering approximate keys in XML data , 2002, CIKM '02.

[7]  V. Paschos A survey of approximately optimal solutions to some covering and packing problems , 1997, CSUR.

[8]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[9]  Heikki Mannila,et al.  On the Complexity of Inferring Functional Dependencies , 1992, Discret. Appl. Math..

[10]  Kyuseok Shim,et al.  XTRACT: a system for extracting document type descriptors from XML documents , 2000, SIGMOD '00.

[11]  Wenfei Fan,et al.  Keys for XML , 2002, Comput. Networks.

[12]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[13]  J W Ballard,et al.  Data on the web? , 1995, Science.

[14]  Denilson Barbosa,et al.  The XML web: a first study , 2003, WWW '03.

[15]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[16]  Heikki Mannila,et al.  Discovering functional and inclusion dependencies in relational databases , 1992, Int. J. Intell. Syst..