An Analysis of Approaches to XML Schema Inference

In this paper we focus on the problem of automatic inferring an XML schema for a given sample set of XML documents. We provide an overview and analysis of existing approaches and compare their key advantages. We conclude the text with a discussion of open issues and problems to be solved as well as their possible solutions.

[1]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[2]  Chin-Wan Chung,et al.  Efficient extraction of schemas for XML documents , 2003, Inf. Process. Lett..

[3]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[4]  Michal Kopecky,et al.  Incox - A Language for XML Integrity Constraints Description , 2008, DATESO.

[5]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[6]  Boris Chidlovskii Schema extraction from XML collections , 2002, JCDL '02.

[7]  Henning Fernau,et al.  Learning XML Grammars , 2001, MLDM.

[8]  JOSEPH FONG Reverse engineering XML documents into DTD Graph with SAX , 2006 .

[9]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[10]  Irena Holubová,et al.  Even an Ant Can Create an XSD , 2008, DASFAA.

[11]  Thomas Schwentick,et al.  Inference of concise DTDs from XML data , 2006, VLDB.

[12]  Kyuseok Shim,et al.  XTRACT: a system for extracting document type descriptors from XML documents , 2000, SIGMOD '00.

[13]  Frank Neven,et al.  DTDs versus XML schema: a practical study , 2004, WebDB '04.

[14]  Arvind Malhotra,et al.  XML Schema Part 2: Datatypes Second Edition , 2004 .

[15]  Denilson Barbosa,et al.  The XML web: a first study , 2003, WWW '03.

[16]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[17]  Giovanna Guerrini,et al.  X-Evolution: A System for XML Schema Evolution and Document Adaptation , 2006, EDBT.

[18]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[19]  Irena Holubová Similarity of XML schema definitions , 2008, ACM Symposium on Document Engineering.

[20]  Bettina Fazzinga,et al.  FOX: Inference of Approximate Functional Dependencies from XML Data , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).

[21]  J. Dvorakova,et al.  Schema-Based Analysis of XSLT Streamability , 2008, 2008 The Second International Conference on Advanced Engineering Computing and Applications in Sciences.

[22]  Felix Naumann,et al.  XStruct: Efficient Schema Extraction from Multiple and Large XML Documents , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[23]  Ee-Peng Lim,et al.  Re-engineering structures from Web documents , 2000, DL '00.

[24]  Jean Berstel,et al.  XML Grammars , 2000, MFCS.

[25]  Helena Ahonen,et al.  Generating grammars for structured documents using grammatical inference methods , 1994 .

[26]  David Beech,et al.  XML-Schema Part 1: Structures Second Edition , 2004 .

[27]  Irena Holubová,et al.  Statistical Analysis of Real XML Data Collections , 2006, COMAD.

[28]  Frank Neven,et al.  Inferring XML Schema Definitions from XML Data , 2007, VLDB.

[29]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[30]  Raymond K. Wong,et al.  On Structural Inference for XML Data , 2003 .