Classification into Readability Levels : Implementation and Evaluation

The use for a readability classification model is mainly as an integrated part of an information retrieval system. By matching the user's demands of readability to the documents with the corresponding readability, the classification model can further improve the results of, for example, a search engine. This thesis presents a new solution for classification into readability levels for Swedish. The results from the thesis are a number of classification models. The models were induced by training a Support Vector Machines classifier on features that are established by previous research as good measurements of readability. The features were extracted from a corpus annotated with three readability levels. Natural Language Processing tools for tagging and parsing were used to analyze the corpus and enable the extraction of the features from the corpus. Empirical testings of different feature combinations were performed to optimize the classification model. The classification models render a good and stable classification. The best model obtained a precision score of 90.21\% and a recall score of 89.56\% on the test-set, which is equal to a F-score of 89.88.