Polynomial Time Inference of General Pattern Languages

Assume a finite alphabet of constant symbols and a disjoint infinite alphabet of variable symbols. A pattern p is a non-empty string of constant and variable symbols. The language L(p) is the set of all words over the alphabet of constant symbols generated from p by substituting some non-empty words for the variables in p. A sample S is a finite set of words over the same alphabet. A pattern p is descriptive of a sample S if and only if it is possible to generate all elements of S from p and, moreover, there is no other pattern q also able to generate S such that L(q) is a proper subset of L(p). The problem of finding a pattern being descriptive of a given sample is studied. It is known that the problem of finding a pattern of maximal length is NP-hard. Till now has be known a polynomial-time algorithm only for the special case of patterns containing only one variable symbol. The main result is a polynomial time algorithm constructing descriptive patterns of maximal length for the general case of patterns containing variable symbols from any finite set a priori fixed.