Transformation of structured documents

SUMMARY Many documents have a definable structure. Some document formatting systems, like the LaTeX formatter, use a structural notation. In recent years the general mark-up language SGML has gained popularity. In this work we study the transformation of a structure to another. For example, technical journals have their structure definitions, and an article originally written for one journal must be restructured before it can be submitted to another journal. We assume that structure definitions are grammatical, and study what kind of transformations can be automatized or at least semiautomatized. We took a collection of computer science journals and compared their structure definitions. We classified differences as simple, local and global. As transformation techniques we studied syntax directed translation schemata and tree transducers. Our conclusion was that simple and local transformations can be automatized or semiautomatized, depending whether additional information is not needed, while global transformations are difficult to automatize. Transformations were tested in our prototype syntax-directed document processing system. The system has one module for editing a document under one structure definition, and another module for changing a document from one structure definition to another.

[1]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[2]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[3]  Frank Wm. Tompa,et al.  Shortening the OED: experience with a grammar-defined database , 1992, TOIS.

[4]  Heikki Mannila,et al.  Generating grammars for SGML tagged texts lacking DTD , 1994 .

[5]  J. Engelfriet Top-down tree transducers with regular look-ahead , 1975 .

[6]  Stanley B. Zdonik,et al.  Type Evolution in an Object-Oriented Database , 1987, Research Foundations in Object-Oriented and Semantic Database Systems.

[7]  Brenda S. Baker,et al.  Tree Transducers and Tree Languages , 1978, Inf. Control..

[8]  Michael Share,et al.  Chameleon: A System for Solving the Data-Translation Problem , 1989, IEEE Trans. Software Eng..

[9]  Jay Banerjee,et al.  Schema Evolution in Object-Oriented Persistent Databases , 1986, XP7.52 Workshop on Database Theory.

[10]  A. Nico Habermann,et al.  Software Development Environments , 1987, Computer.

[11]  Sandra A. Mamrak,et al.  Braced languages and a model of translation for context-free strings: theory and practice , 1987 .

[12]  James P. Fry Conversion Technology, An Assessment , 1982, SIGMOD Rec..

[13]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[14]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[15]  Toshiro Wakayama,et al.  A Reconstruction of Context-Dependent Document Processing In SGML , 1991 .

[16]  Peter P. Silvester,et al.  The UNIX™ System Guidebook , 1988, Springer Books on Professional Computing.

[17]  Dennis S. Arnon,et al.  Scrimshaw: A Language for Document Queries and Transformations , 1993, Electron. Publ..

[18]  Sandra A. Mamrak,et al.  A Universal Framework for Data Transformation , 1993 .

[19]  S. E. Keller,et al.  Tree transformation techniques and experiences , 1984, SIGPLAN '84.

[20]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[21]  Martti Penttonen,et al.  Transformation of Structured Documents with the Use of Grammar , 1993, Electron. Publ..

[22]  James R. Cordy,et al.  The TXL Programming Language Syntax and Informal Semantics , 1993 .

[23]  Magnus Steinby A Formal Theory of Errors in Tree Representations of Patterns , 1990, J. Inf. Process. Cybern..

[24]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[25]  Ricardo A. Baeza-Yates,et al.  A language for queries on structure and contents of textual databases , 1995, SIGIR '95.

[26]  E. Akpotsui,et al.  Implementing the cut-and-paste operation in a structured editing system , 1997 .

[27]  Heikki Mannila,et al.  Retrieval from hierarchical texts by partial patterns , 1993, SIGIR.

[28]  Alexander Borgida,et al.  Language features for flexible handling of exceptions in information systems , 1985, TODS.

[29]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[30]  Kristian Fischer,et al.  The Open Document Architecture: From Standardization to the Market , 1992, IBM Syst. J..

[31]  James P. Fry,et al.  Conversion technology, an assessment , 1981, DATB.

[32]  P. David Stotts,et al.  Specifying structured document transformations , 1988 .

[33]  Frank Wm. Tompa,et al.  Text/Relational Database Management Systems: Overview and Proposed SQL Extensions , 1995 .

[34]  Max Dauchet,et al.  Bi-transductions de forêts , 1976, International Colloquium on Automata, Languages and Programming.

[35]  Kazuya Chiba,et al.  Document Transformation Based on Syntax-directed Tree Translation , 1995, Electron. Publ..

[36]  Peter Desain Tree doctor, a software package for graphical manipulation and animation of tree structures , 1986 .

[37]  Bruno Courcelle,et al.  Attribute Grammars and Recursive Program Schemes II , 1982, Theor. Comput. Sci..

[38]  Vincent Quint,et al.  Type modelling for document transformation in structured editing systems , 1994 .

[39]  Jean-Claude Raoult A quick look at tree transductions , 1993 .

[40]  james w.thatcher,et al.  tree automata: an informal survey , 1974 .

[41]  Toshiro Wakayama,et al.  SIMON: A Grammar-based Transformation System for Structured Documents , 1993, Electron. Publ..

[42]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[43]  Bruno Courcelle,et al.  Attribute Grammars and Recursive Program Schemes I , 1982, Theoretical Computer Science.

[44]  Derick Wood,et al.  Theory of computation , 1986 .

[45]  Thomas Reps,et al.  The Synthesizer Generator: A System for Constructing Language-Based Editors , 1988 .

[46]  Brenda S. Baker Generalized Syntax Directed Translation, Tree Transducers, and Linear Space , 1978, SIAM J. Comput..

[47]  Alfred V. Aho,et al.  Currents In The Theory Of Computing , 1973 .

[48]  Peter C. Chapin Formal languages I , 1973, CSC '73.

[49]  Michael Share,et al.  A software architecture for supporting the exchange of electronic manuscripts , 1987, CACM.

[50]  M. J. Plasmeijer,et al.  Term graph rewriting: theory and practice , 1993 .

[51]  Joost Engelfriet,et al.  Modular Tree Transducers , 1991, Theor. Comput. Sci..

[52]  Andreas Podelski,et al.  Tree Automata and Languages , 1992 .

[53]  David Garlan,et al.  TransformGen: automating the maintenance of structure-oriented environments , 1994, TOPL.

[54]  Allen L. Brown,et al.  A Logic Grammar Foundation for Document Representation and Document Layout , 1990 .

[55]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[56]  Ricardo A. Baeza-Yates,et al.  Integrating contents and structure in text retrieval , 1996, SGMD.

[57]  Helena Ahonen,et al.  Generating grammars for structured documents using grammatical inference methods , 1994 .

[58]  Donald D. Cowan,et al.  Rita - an Editor and User Interface for Manipulating Structured Documents , 1991, Electron. Publ..

[59]  David Notkin,et al.  Gandalf: Software development environments , 1986, IEEE Transactions on Software Engineering.

[60]  Vincent Quint,et al.  Interactively Editing Structured Documents , 1989, Electron. Publ..

[61]  Heiko Vogler Basic Tree Transducers , 1987, J. Comput. Syst. Sci..

[62]  Thomas W. Reps,et al.  The Synthesizer Generator Reference Manual , 1989, Texts and Monographs in Computer Science.

[63]  Joost Engelfriet,et al.  Macro Tree Transducers , 1985, J. Comput. Syst. Sci..

[64]  Christer Hulten,et al.  Making type changes transparent , 1984 .

[65]  Kablan Barbar Attributed Tree Grammars , 1993, Theor. Comput. Sci..

[66]  Henk Alblas,et al.  Attribute Grammars, Applications and Systems , 1991, Lecture Notes in Computer Science.