Introduction Literary scholars and common readers use labels like educational novel, crime novel or adventure novel to organize the large domain of fiction. In both discourses the use of these categories is well-established even though they are evolving and tend to be inconsistent. The classification of genres is one of the standard tasks in document classification and has been researched intensively (c.f. Biber 1989, Santini 2004, Freund et al 2006, Sharoff et al. 2010). Some results seem impressive, for example distinguishing clear-cut genres like poetry from fiction (Underwood 2014), but most texts on literary genre classification emphasize, as the literature on genre classification in general, the variability of genre signals (Allison et al. 2011: 19, Underwood et al. 2013, Underwood 2014). The scores for genre classification over all categories are therefore often not very high. Jockers for example reports an accuracy of 67% (Jockers 2013: 81). Genre classification in general works best with most frequent words, all words or character tetragrams (Freund et al. 2006, Sharoff et al. 2010) and most of the reported experiments for literary genre classification also use all words or only the n most frequent word (sometimes including punctuation) as features. In a series of experiments we examine whether it is possible to enhance these results for the classification of subgenres of novels. Our research is motivated by an understanding of novel genres as concepts which are differentiated by style, settings, character constellations and plots. We use most frequent words as an indicator for style and network characteristics as an indicator for character constellations. Setting is partially covered by topic models which also represent information on typical ways of telling a story, narrative topoi. We have to omit plot, as we don’t have a reliable way to represent plot by any indicators yet.
[1]
D. Biber.
A typology of English texts
,
1989
.
[2]
Bei Yu.
An Evaluation of Text Classification Methods for Literary Study
,
2018
.
[3]
F. Puppe,et al.
Automatische Erkennung von Figuren in deutschsprachigen Romanen
,
2015,
DHd.
[4]
Isabella Reger,et al.
Genre Classification on German Novels
,
2015,
2015 26th International Workshop on Database and Expert Systems Applications (DEXA).
[5]
Jácint Szabó,et al.
Latent dirichlet allocation in web spam filtering
,
2008,
AIRWeb '08.
[6]
Charles L. A. Clarke,et al.
Towards genre classification for IR in the workplace
,
2006,
IIiX.
[7]
Matthew L. Jockers,et al.
Quantitative formalism: an experiment
,
2011
.
[8]
Matthew L. Jockers.
Macroanalysis: Digital Methods and Literary History
,
2013
.
[9]
Bonnie L. Webber,et al.
Squibs: Stable Classification of Text Genres
,
2011,
CL.
[10]
Katja Markert,et al.
The Web Library of Babel: evaluating genre collections
,
2010,
LREC.
[11]
Aidan Finn,et al.
Learning to classify documents according to genre
,
2006,
J. Assoc. Inf. Sci. Technol..