The Complexity of SORE-definability Problems

Single occurrence regular expressions (SORE) are a special kind of deterministic regular expressions, which are extensively used in the schema languages DTD and XSD for XML documents. In this paper, with motivations from the simplification of XML schemas, we consider the SORE-definability problem: Given a regular expression, decide whether it has an equivalent SORE. We investigate extensively the complexity of the SORE-definability problem: We consider both (standard) regular expressions and regular expressions with counting, and distinguish between the alphabets of size at least two and unary alphabets. In all cases, we obtain tight complexity bounds. In addition, we consider another variant of this problem, the bounded SORE-definability problem, which is to decide, given a regular expression E and a number M (encoded in unary or binary), whether there is an SORE, which is equivalent to E on the set of words of length at most M. We show that in several cases, there is an exponential decrease in the complexity when switching from the SORE-definability problem to its bounded variant.

[1]  Pekka Kilpeläinen,et al.  One-unambiguity of regular expressions with numeric occurrence indicators , 2007, Inf. Comput..

[2]  Thomas Schwentick,et al.  Complexity of Decision Problems for XML Schemas and Chain Regular Expressions , 2009, SIAM J. Comput..

[3]  Pekka Kilpeläinen,et al.  Checking determinism of XML Schema content models in optimal time , 2011, Inf. Syst..

[4]  Ping Lu,et al.  Deciding determinism of unary languages , 2015, Inf. Comput..

[5]  Walter J. Savitch,et al.  Relationships Between Nondeterministic and Deterministic Tape Complexities , 1970, J. Comput. Syst. Sci..

[6]  Pekka Kilpeläinen,et al.  Regular Expressions with Numerical Occurrence Indicators - preliminary results , 2003, SPLST.

[7]  Ping Lu,et al.  Checking determinism of regular expressions with counting , 2015, Inf. Comput..

[8]  Marcus Schaefer Completeness in the Polynomial Time Hierarchy , 2001 .

[9]  Thomas Schwentick,et al.  Inference of concise regular expressions and DTDs , 2010, TODS.

[10]  Dag Hovland,et al.  The Membership Problem for Regular Expressions with Unordered Concatenation and Numerical Constraints , 2012, LATA.

[11]  Marc Gyssens,et al.  Regular Expressions with Counting: Weak versus Strong Determinism , 2009, SIAM J. Comput..

[12]  Sebastian Maneth,et al.  Deterministic regular expressions in linear time , 2012, PODS.

[13]  Frank Neven,et al.  Simplifying XML schema: effortless handling of nondeterministic regular expressions , 2009, SIGMOD Conference.

[14]  V. Glushkov THE ABSTRACT THEORY OF AUTOMATA , 1961 .

[15]  Dung T. Huynh,et al.  The Parallel Complexity of Finite-State Automata Problems , 1992, Inf. Comput..

[16]  Matthias Niewerth,et al.  Definability by Weakly Deterministic Regular Expressions with Counters is Decidable , 2015, MFCS.

[17]  M. Sherman,et al.  A Preliminary Report , 1953 .

[18]  Rupak Majumdar,et al.  Unary Pushdown Automata and Straight-Line Programs , 2014, ICALP.

[19]  Ping Lu,et al.  Deciding Determinism of Regular Languages , 2014, Theory of Computing Systems.

[20]  Klaus W. Wagner,et al.  The complexity of combinatorial problems with succinct input representation , 1986, Acta Informatica.

[21]  Anne Brüggemann-Klein Regular Expressions into Finite Automata , 1993, Theor. Comput. Sci..

[22]  Robert McNaughton,et al.  Regular Expressions and State Graphs for Automata , 1960, IRE Trans. Electron. Comput..

[23]  Thomas Schwentick,et al.  Inference of concise DTDs from XML data , 2006, VLDB.

[24]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[25]  Ping Lu,et al.  Assisting the Design of XML Schema: Diagnosing Nondeterministic Content Models , 2011, APWeb.

[26]  Frank Neven,et al.  Learning deterministic regular expressions for the inference of schemas from XML data , 2008, WWW.

[27]  Thomas Schwentick,et al.  Expressiveness and complexity of XML Schema , 2006, TODS.

[28]  Dag Hovland Regular Expressions with Numerical Constraints and Automata with Counters , 2009, ICTAC.

[29]  M. Schaefer,et al.  Completeness in the Polynomial-Time Hierarchy A Compendium ∗ , 2008 .

[30]  Dung T. Huynh Deciding the Inequivalence of Context-Free Grammars with 1-Letter Terminal Alphabet is Sigma-p-2-Complete , 1984, Theor. Comput. Sci..

[31]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .