A comparison of machine learning techniques for taxonomic classification of teeth from the Family Bovidae

Abstract This study explores the performance of machine learning algorithms on the classification of fossil teeth in the Family Bovidae. Isolated bovid teeth are typically the most common fossils found in southern Africa and they often constitute the basis for paleoenvironmental reconstructions. Taxonomic identification of fossil bovid teeth, however, is often imprecise and subjective. Using modern teeth with known taxons, machine learning algorithms can be trained to classify fossils. Previous work by Brophy et al. [Quantitative morphological analysis of bovid teeth and implications for paleoenvironmental reconstruction of plovers lake, Gauteng Province, South Africa, J. Archaeol. Sci. 41 (2014), pp. 376–388] uses elliptical Fourier analysis of the form (size and shape) of the outline of the occlusal surface of each tooth as features in a linear discriminant analysis (LDA) framework. This manuscript expands on that previous work by exploring how different machine learning approaches classify the teeth and testing which technique is best for classification. In addition to LDA, four other machine learning techniques were considered (neural networks, nuclear penalized multinomial regression,random forests, and support vector machines) with support vector machines and random forests performing the best in terms of log loss and classification rate.

[1]  Junichi Sugiyama,et al.  Comparison between linear discrimination analysis and support vector machine for detection of pesticide on spinach leaf by hyperspectral imaging with excitation-emission matrix , 2009, ESANN.

[2]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[3]  Trevor Hastie,et al.  Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball , 2018, Statistical modelling.

[4]  Juliet K. Brophy,et al.  Faunal assemblage composition and paleoenvironment of Plovers Lake, a Middle Stone Age locality in Gauteng Province, South Africa. , 2008, Journal of human evolution.

[5]  E. Vrba,et al.  The fossil Bovidae of Sterkfontein, Swartkrans And Kromdraai , 1974 .

[6]  Justin W. Adams,et al.  Plio-Pleistocene Faunal Remains from Gondolin GD 2 In Situ Assemblage, Northwest Province South Africa , 2005, Interpreting the Past.

[7]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[8]  Juliet Krueger Brophy Reconstructing the Habitat Mosaic of Australopithecus robustus: Evidence from Quantitative Morphological Analysis of Bovid Teeth , 2012 .

[9]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[10]  Stephan Mehler,et al.  Modern Applied Statistics , 2016 .

[11]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[12]  Su-Cheng Haw,et al.  Comparison of Linear Discriminant Analysis and Support Vector Machine in Classification of Subdural and Extradural Hemorrhages , 2011, ICSECS.

[13]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[14]  R. Brereton,et al.  Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure , 2009 .

[15]  Yu Quan,et al.  Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data , 2009, Journal of experimental & clinical cancer research : CR.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[18]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[19]  P. Lestrel Method for analyzing complex two‐dimensional forms: Elliptical Fourier functions , 1989, American journal of human biology : the official journal of the Human Biology Council.

[20]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[21]  C. W. Smith,et al.  Reconstructing the habitat mosaic associated with Australopithecus robustus: Evidence from quantitative morphological analysis of bovid teeth , 2011 .

[22]  Thomas J. DeWitt,et al.  Quantitative morphological analysis of bovid teeth and implications for paleoenvironmental reconstruction of Plovers Lake, Gauteng Province, South Africa , 2014 .

[23]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .