Cut and paste

The paper develops EDITOR, a language for manipulating semi-structured documents, such as the ones typically available on the Web. EDITOR programs allow to search and restructure a document. They are based on two simple ideas, taken from text editors: Search” instructions are used to select regions of interest in a document, and “cut .!Y paste” to restructure them. We study the expressive power and the complexity of these programs. We show that they are computationally complete, in the sense that any computable document restructuring can be expressed in EDITOR. We also study the complexity of a safe subclass of programs, showing that it captures exactly the class of polynomial-time restructurings. The language has been implemented in Java, and is used in the ARANEUS project to build database views over Web sites.

[1]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[2]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[3]  Gaston H. Gonnet,et al.  Mind Your Grammar: a New Approach to Modelling Text , 1987, VLDB.

[4]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[5]  Heikki Mannila,et al.  A Structured Document Database System , 1990 .

[6]  Stephen A. Cook,et al.  A new recursion-theoretic characterization of the polytime functions (extended abstract) , 1992, STOC '92.

[7]  Serge Abiteboul,et al.  Querying and Updating the File , 1993, VLDB.

[8]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[9]  Gaston H. Gonnet Text dominated databases, theory practice and experience (abstract) , 1994, PODS '94.

[10]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[11]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[12]  Edward L. Robertson,et al.  A query language for list-based complex objects , 1994, PODS '94.

[13]  Arjan Loeffen Text databases: a survey of text models and systems , 1994, SGMD.

[14]  Tim Berners-Lee,et al.  The World-Wide Web , 1994, CACM.

[15]  Frank Wm. Tompa,et al.  Text / Relational Database Management Systems: Harmonizing SQL and SGML , 1994, ADB.

[16]  Anthony J. Bonner,et al.  Sequences, Datalog and transducers , 1995, PODS '95.

[17]  Ian S. Graham The HTML SourceBook , 1995 .

[18]  Tova Milo,et al.  An Algebra for Pomsets , 1995, ICDT.

[19]  T. Milo,et al.  A Database Interface for File Updates , 1995, SIGMOD Conference.

[20]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[21]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[22]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[23]  V. S. Subrahmanian,et al.  Foundations of multimedia database systems , 1996, JACM.

[24]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[25]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[26]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[27]  Alberto O. Mendelzon,et al.  Formal models of Web queries , 1997, Inf. Syst..

[28]  Paolo Merialdo,et al.  Structures in the Web , 1997, Sistemi Evoluti per Basi di Dati.

[29]  Charles L. A. Clarke,et al.  On the use of regular expressions for searching text , 1997, TOPL.

[30]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[31]  Christos H. Papadimitriou,et al.  Elements of the Theory of Computation , 1997, SIGA.