论文信息 - A near pattern-matching scheme based upon principal component analysis

A near pattern-matching scheme based upon principal component analysis

Abstract In this paper, we present an efficient heuristic near pattern-matching scheme. Based upon an important multivariate analysis technique in statistics, called the principal components analysis, we develop algorithms to generate a set of new identifying keys for a given set of patterns to reduce the number of comparisons during the near-matching process. After some preprocessing work, the near-matching operation takes O( n log m ) time in the worst case, where m is the number of identifying segments extracted from the patterns to be searched in a text file of length n .

Chin-Chen Chang | Richard C. T. Lee | C. Y. Chen | Chinchen Chang

[1] Y. H. Chin,et al. Application ofPrincipal ComponentAnalysis to IMultikey Searching , 1976 .

[2] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[3] V. Clark,et al. Computer-aided multivariate analysis , 1991 .

[4] Robert S. Boyer,et al. A fast string searching algorithm , 1977, CACM.

[5] G. de V. Smit,et al. A Comparison of Three String Matching Algorithms , 1982, Softw. Pract. Exp..

[6] King-Sun Fu,et al. On the generalized Karhunen-Loeve expansion (Corresp.) , 1967, IEEE Trans. Inf. Theory.

[7] Ellis Horowitz,et al. Fundamentals of Computer Algorithms , 1978 .

[8] Richard C. T. Lee,et al. Application of Principal Component Analysis to Multikey Searching , 1976, IEEE Transactions on Software Engineering.

[9] Alfred V. Aho,et al. Efficient string matching , 1975, Commun. ACM.

[10] D. F. Morrison,et al. Multivariate Statistical Methods , 1968 .