Assume a finite alphabet of constant symbols and a disjoint infinite alphabet of variable symbols. A pattern p is a non-empty string of constant and variable symbols. The language L(p) is the set of all words over the alphabet of constant symbols generated from p by substituting some non-empty words for the variables in p. A sample S is a finite set of words over the same alphabet. A pattern p is descriptive of a sample S if and only if it is possible to generate all elements of S from p and, moreover, there is no other pattern q also able to generate S such that L(q) is a proper subset of L(p). The problem of finding a pattern being descriptive of a given sample is studied. It is known that the problem of finding a pattern of maximal length is NP-hard. Till now has be known a polynomial-time algorithm only for the special case of patterns containing only one variable symbol. The main result is a polynomial time algorithm constructing descriptive patterns of maximal length for the general case of patterns containing variable symbols from any finite set a priori fixed.
[1]
Takeshi Shinohara,et al.
Polynomial Time Inference of Extended Regular Pattern Languages
,
1983,
RIMS Symposium on Software Science and Engineering.
[2]
Dana Angluin,et al.
Finding Patterns Common to a Set of Strings
,
1980,
J. Comput. Syst. Sci..
[3]
Robert Nix,et al.
Editing by example
,
1985,
POPL '84.
[4]
Dana Angluin,et al.
Finding patterns common to a set of strings (Extended Abstract)
,
1979,
STOC.
[5]
Rolf Wiehagen,et al.
Research in the theory of inductive inference by GDR mathematicians - A survey
,
1980,
Inf. Sci..