An Efficient Word Searching Algorithm through Splitting and Hashing the Offline Text

Word matching problem is to find all the occurrences of a pattern P[0…m-1] in the text T[0…n-1], where P neither contains any white space nor preceded and followed by space. In this paper, we assume that our text is offline. Ibrahiem et al. in 2008 have proposed an algorithm (WSA) for solving the word matching problem by splitting the offline text into number of tables in the preprocessing phase. The main drawback of this algorithm was: after splitting the text into a number of tables, they search each occurrence of the pattern by the brute force manner in each table. In this paper, we improved the algorithm by using an efficient hash function SDBM proposed by R. J. Enbody et al. in 1988. In this technique, after splitting the text into number of tables, we match the hash value of the pattern P with the hash values of the words of same length in the text T. This algorithm is called as modified word searching algorithm (MWSA). Experimental results show that MWSA algorithm is much faster than the previously proposed WSA algorithm.

[1]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[2]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[3]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[4]  Suneeta Agarwal,et al.  An Efficient String Matching Algorithm Using Super Alphabets , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[5]  Per-Åke Larson,et al.  Dynamic hashing , 1978, BIT.

[6]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[7]  Kimmo Fredriksson,et al.  Shift-or string matching with super-alphabets , 2003, Inf. Process. Lett..

[8]  Richard J. Enbody,et al.  Dynamic hashing schemes , 1988, CSUR.

[9]  Ibrahiem M. M. El Emary,et al.  A New Approach for Solving String Matching Problem through Splitting the Unchangeable Text , 2008 .