Pattern Induction and matching in polyphonic music and other multidimensional datasets

We present a new algorithm, SIA, which discovers maximal repeated patterns in any set of points in Cartesian spaces of any dimensionality. The worst-case running time of SIA is O(kn log2 n) for a k-dimensional dataset of size n. SIATEC is an extension of SIA that generates a set of translational equivalence classes (TECs). If the input represents a musical score then each TEC contains all the transposition-invariantoccurrences of a single maximal repeated pattern in the score. In the worst case, SIATEC takes time O(kn) to compute the TEC of every maximal pattern computed by SIA. We have also experimented with a set of heuristics, MU, that takes as input a dataset representing a musical surface together with the set of TECs generated by SIATEC for this dataset. MU computes a value for each TEC that is intended to represent the “musical significance” of the TEC. It then presents the TECs ordered according to this value. The combined system of MU and SIATEC (which we call MUSIATEC) has been used to analyse some largescale polyphonic works with very encouraging results. We have also generalised SIA to produce a new pattern-matching algorithm. This algorithm, called SIA(M)ESE, takes as input a query pattern and a dataset and outputs a set of matches for the query pattern in the dataset. SIA(M)ESE is capable of true polyphonic music pattern matching in O(kmn log2 n) time when looking for a k-dimensional pattern of size m in a k-dimensional text of size n. This makes SIA(M)ESE more efficient than existing algorithms for this purpose. The work presented here is subject to patent protection.