Extraction of Authors' Charateristics from japanese Modern Setences via N-gram Distribution

Objects of many studies of authorship attribution have been text data in which boundaries between words are obvious [1] [2]. When we apply these studies to languages in which sentences could not be divided obviously into words, such as Japanese or Chinese, preliminary processing of text data such as morphological analysis is required and may influence the final results. The methods which make use of characteristics of particular languages or particular compositions also have limited coverage [3]. Extracting authors’ characteristics from sentences is generally an unsolved problem. Therefore, we propose a method for authorship attribution based on distribution of n-grams of characters in sentences. The proposed method can analyze sentences without any additional information, i.e. preliminary analyses. The experiments, where 3-grams to represent author’s characteristics were educed on the basis of their distributions, are also reported in the following.