Search-Based Evolution of XML Schemas

The use of schemas makes an XML-based application more reliable, since they contribute to avoid failures by defining the specific format for the data that the application manipulates. In practice, when an application evolves, new requirements for the data may be established, raising the need of schema evolution. In some cases the generation of a schema is necessary, if such schema does not exist. To reduce maintenance and reengineering costs, automatic evolution of schemas is very desirable. However, there are no algorithms to satisfactorily solve the problem. To help in this task, this paper introduces a search-based approach that explores the correspondence between schemas and context-free grammars. The approach is supported by a tool, named EXS. Our tool implements algorithms of grammatical inference based on LL(1) Parsing. If a grammar (that corresponds to a schema) is given and a new word (XML document) is provided, the EXS system infers the new grammar that: i) continues to generate the same words as before and ii) generates the new word, by modifying the original grammar. If no initial grammar is available, EXS is also capable of generating a grammar from scratch from a set of samples.

[1]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[2]  Katsuhiko Nakamura,et al.  Incremental learning of context free grammars based on bottom-up parsing and search , 2005, Pattern Recognit..

[3]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.

[4]  Mark Harman,et al.  Search-Based Software Engineering for Maintenance and Reengineering , 2006, CSMR.

[5]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[6]  Byron Choi,et al.  What are real DTDs like? , 2002, WebDB.

[7]  Frank Neven,et al.  Learning deterministic regular expressions for the inference of schemas from XML data , 2008, WWW.

[8]  Sean R. Eddy,et al.  Biological sequence analysis: Preface , 1998 .

[9]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[10]  Thomas Bäck,et al.  Evolutionary computation: comments on the history and current state , 1997, IEEE Trans. Evol. Comput..

[11]  Ye Wu,et al.  Modeling and Testing Web-based Applications , 2002 .

[12]  Heikki Mannila,et al.  Generating grammars for SGML tagged texts lacking DTD , 1994 .

[13]  Katsuhiko Nakamura,et al.  Synthesizing Context Free Grammars from Sample Strings Based on Inductive CYK Algorithm , 2000, ICGI.

[14]  Mirian Halfeld Ferrari Alves,et al.  Regular expression transformations to extend regular languages (with application to a Datalog XML schema validator) , 2007, J. Algorithms.

[15]  Helena Ahonen Automatic generation of SGML content models , 1995 .

[16]  Kyuseok Shim,et al.  XTRACT: a system for extracting document type descriptors from XML documents , 2000, SIGMOD '00.

[17]  Vidroha Debroy,et al.  Genetic Programming , 1998, Lecture Notes in Computer Science.

[18]  Mirian Halfeld Ferrari Alves,et al.  Minimal Tree Language Extensions: A Keystone of XML Type Compatibility and Evolution , 2010, ICTAC.

[19]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[20]  Béatrice Bouchou-Markhoff,et al.  Schema Evolution for XML: A Consistency-Preserving Approach , 2004, MFCS.

[21]  Jennifer Widom Data Management for XML: Research Directions , 1999, IEEE Data Eng. Bull..

[22]  Pascal Caron,et al.  Characterization of Glushkov automata , 2000, Theor. Comput. Sci..

[23]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[24]  D. Jeya Mala,et al.  A Hybrid Test Optimization Framework -- Coupling Genetic Algorithm with Local Search Technique , 2010, Comput. Informatics.

[25]  Katsuhiko Nakamura,et al.  Incremental Learning of Context Free Grammars , 2002, ICGI.

[26]  Ye Wu,et al.  Modeling and Testing Web-based Applications , 2002 .

[27]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[28]  Gheorghe Păun,et al.  Mathematical Aspects of Natural and Formal Languages , 1994, World scientific series in computer science.

[29]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[30]  Enrique Alba,et al.  A Tabu Search Algorithm for Scheduling Independent Jobs in Computational Grids , 2009, Comput. Informatics.

[31]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[32]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[33]  Aurora Trinidad Ramirez Pozo,et al.  XML Schema Evolution by Context Free Grammar Inference , 2007, SEKE.

[34]  Kenneth H. Fasman,et al.  Chapter 3 - An introduction to biological sequence analysis , 1998 .

[35]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..