Factors Affecting Automatic Genre Classification: An Investigation Incorporating Non-Western Musical Forms

The number of studies investigating automated genre classification is growing following the increasing amounts of digital audio data available. The underlying techniques to perform automated genre classification in general include feature extraction and classification. In this study, MARSYAS was used to extract audio features and the suite of tools available in WEKA was used for the classification. This study investigates the factors affecting automated genre classification. As for the dataset, most studies in this area work with western genres and traditional Malay music is incorporated in this study. Eight genres were introduced; Dikir Barat, Etnik Sabah, Inang, Joget, Keroncong, Tumbuk Kalang, Wayang Kulit, and Zapin. A total of 417 tracks from various Audio Compact Discs were collected and used as the dataset. Results show that various factors such as the musical features extracted, classifiers employed, the size of the dataset, excerpt length, excerpt location and test set parameters improve classification results.