Automatic new word extraction method

New words are very difficult to be extracted automatically for those languages where there is no word boundary in written texts, such as Chinese, Japanese etc. In this paper, we present a Statistical method to extract new words from a large amount of corpus with no word boundary. Based on Generalized Suffix Tree (GST) data structure we define NWP (New Word Pattern) and SBP (Segmentation Boundary Pattern) to separate input strings into small pieces, and offer a practical and efficient algorithm to get the proper words from GST.