Regular combinators for string transformations

We focus on (partial) functions that map input strings to a monoid such as the set of integers with addition and the set of output strings with concatenation. The notion of regularity for such functions has been defined using two-way finite-state transducers, (one-way) cost register automata, and MSO-definable graph transformations. In this paper, we give an algebraic and machine-independent characterization of this class analogous to the definition of regular languages by regular expressions. When the monoid is commutative, we prove that every regular function can be constructed from constant functions using the combinators of choice, split sum, and iterated sum, that are analogs of union, concatenation, and Kleene-*, respectively, but enforce unique (or unambiguous) parsing. Our main result is for the general case of non-commutative monoids, which is of particular interest for capturing regular string-to-string transformations for document processing. We prove that the following additional combinators suffice for constructing all regular functions: (1) the left-additive versions of split sum and iterated sum, which allow transformations such as string reversal; (2) sum of functions, which allows transformations such as copying of strings; and (3) function composition, or alternatively, a new concept of chained sum, which allows output values from adjacent blocks to mix.

[1]  Alfred V. Aho,et al.  A general theory of translation , 1969, Mathematical systems theory.

[2]  Eitan M. Gurari The equivalence problem for deterministic two-way sequential transducers is decidable , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[3]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[4]  Bruno Courcelle,et al.  Monadic Second-Order Definable Graph Transductions: A Survey , 1994, Theor. Comput. Sci..

[5]  M. Droste,et al.  Handbook of Weighted Automata , 2009 .

[6]  Joost Engelfriet,et al.  Macro Tree Transducers, Attribute Grammars, and MSO Definable Tree Translations , 1999, Inf. Comput..

[7]  Krishnendu Chatterjee,et al.  Quantitative languages , 2008, TOCL.

[8]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[9]  Michal Chytil,et al.  Serial Composition of 2-Way Finite-State Transducers and Simple Programs on Strings , 1977, ICALP.

[10]  Thomas Colcombet,et al.  The Theory of Stabilisation Monoids and Regular Cost Functions , 2009, ICALP.

[11]  Rajeev Alur,et al.  Regular Functions and Cost Register Automata , 2013, 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science.

[12]  Pavol Cerný,et al.  Streaming transducers for algorithmic verification of single-pass list-processing programs , 2010, POPL '11.

[13]  Rajeev Alur,et al.  Streaming Tree Transducers , 2012, ICALP.

[14]  Pavol Cerný,et al.  Expressiveness of streaming string transducers , 2010, FSTTCS.

[15]  IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2013, December 12-14, 2013, Guwahati, India , 2011, FSTTCS.

[16]  Mikolaj Bojanczyk,et al.  Transducers with Origin Information , 2013, ICALP.

[17]  Joost Engelfriet,et al.  Macro Tree Transducers , 1985, J. Comput. Syst. Sci..

[18]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[19]  Bruno Courcelle,et al.  Monadic Second-Order Graph Transductions , 1992, CAAP.

[20]  Joost Engelfriet,et al.  MSO definable string transductions and two-way finite-state transducers , 1999, TOCL.

[21]  Rajeev Alur,et al.  Regular Transformations of Infinite Strings , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.