List Based Matching Algorithm for Classifying News Articles in NewsPage.com

This research proposes an alternative approach to machine learning based ones for categorizing news articles given as in plain texts. In order to use one of machine learning based approaches for the task, documents should be encoded into numerical vectors; it causes two problems: huge dimensionality and sparse distribution. The proposed approach is intended to address the two problems. In other words, the two problems are avoided by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of the research is to improve the performance of text categorization by solving the two problems.