Experimental study of time series-based dataset selection for effective text classification

Conventional automatic document classification methods are currently faced with challenges in terms of learning time and computing power, owing to the ever-increasing amount of data on the web. In this paper, we propose an efficient classification method that uses time series-based dataset selection. In the proposed method, the dataset is split based on time series data and the best set of testing documents selected. The results of classification performance tests conducted using a Naïve Bayes classifier indicate that using a small amount of data divided in terms of time series is more efficient than using the entire dataset for learning.