Synchronized regular expressions

Abstract. Text manipulation is one of the most common tasks for everyone using a computer. The increasing number of textual information in electronic format that every computer user collects everyday also increases the need of more powerful tools to interact with texts. Indeed, much work has been done to provide simple and versatile tools that can be useful for the most common text manipulation tasks. Regular Expressions (RE), introduced by Kleene, are well known in the formal language theory. RE have been extended in various ways, depending on the application of interest. In almost all the implementations of RE search algorithms (e.g. the egrep [15] UNIX command, or the Perl [20] language pattern matching constructs) we find backreferences, i.e. expressions that make reference to the string matched by a previous subexpression. Generally speaking, it seems that all kinds of synchronizations between subexpressions in a RE can be very useful when interacting with texts. In this paper we introduce the Synchronized Regular Expressions (SRE) as an extension of the Regular Expressions. We use SRE to present a formal study of the already known backreferences extension, and of a new extension proposed by us, which we call the synchronized exponents. Moreover, since we are dealing with formalisms that should have a practical utility and be used in real applications, we have the problem of how to present SRE to the final users. Therefore, in this paper we also propose a user-friendly syntax for SRE to be used in implementations of SRE-powered search algorithms.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Victor Mitrana Patterns and Languages: An Overview , 1999, Grammars.

[3]  Gheorghe Paun,et al.  Grammars with Controlled Derivations , 1997, Handbook of Formal Languages.

[4]  Markus G. Kuhn,et al.  Information hiding-a survey , 1999, Proc. IEEE.

[5]  Carl A. Gunter,et al.  In handbook of theoretical computer science , 1990 .

[6]  Aldo de Luca,et al.  Finiteness and Regularity in Semigroups and Formal Languages , 1999, Monographs in Theoretical Computer Science An EATCS Series.

[7]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[8]  J. van Leeuwen,et al.  Information Hiding , 1999, Lecture Notes in Computer Science.

[10]  M. W. Shields An Introduction to Automata Theory , 1988 .

[11]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[12]  M. Lothaire Combinatorics on words: Bibliography , 1997 .

[13]  Dan Collusion-Secure Fingerprinting for Digital Data , 2002 .

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Jan Maluszynski,et al.  A grammatical view of logic programming , 1988, PLILP.

[16]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[17]  Arto Salomaa,et al.  Finite Degrees of Ambiguity in Pattern Languages , 1994, RAIRO Theor. Informatics Appl..

[18]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[19]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .