Mining the ESROM: A study of breeding value prediction in Manchego sheep by means of classification techniques plus attribute selection and construction

Manchego sheep is the native breed in Castilla-La Mancha (a region of Spain). Its two main products are Manchego cheese and Manchego lamb, representing more than 50% of the final animal production in the region. Because of these economical implication and with the aim of improving Manchego sheep production, a selection scheme (called ESROM) based on the animal genetic merit was started fifteen years ago. One of the major points in the selection scheme is the estimation of the breeding value, and its use in flock replacements. In the ESROM scheme, the breeding value is estimated by using BLUP animal model, which is a complex method based on relating different traits by linear equations, and solving the system by simultaneously taking into account all the available information. In this paper we study the use of data mining techniques to deal with breeding value classification. The goal of the paper is far enough of replacing BLUP in breeding value estimation, on the contrary, our goal is to learn in a supervised way from the results produced by BLUP, and to use the learned models to provide preliminary information about the breeding value of an animal. The advantages of using those models is that few information is required and the estimation can be done as soon as the data (about a few variables) is ready for a given animal, allowing to take early decisions or to delay them until a deeper study is carried out. We start the data mining process identifying a proper data set from the whole available data. Then we use standard classification techniques combined with feature subset selection to identify good attribute subsets to be used as predictors. Attribute selection is done on the basis of filter and wrapper algorithms, and we also proposed a filter+wrapper algorithms which provide close to wrapper results with a remarkable smaller computational cost. We also show that the classifiers accuracy can be considerably improved (around a 4% on the average) by using attribute construction. Finally we discuss about some tasks performed in the ESROM scheme in relation with the obtained classification models. Keyworkds: Manchego sheep, selection scheme, breeding value, classification algorithms, data mining, attribute selection, attribute construction.

[1]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[2]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[3]  Peter A. Flach,et al.  Delegating classifiers , 2004, ICML.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[7]  Ivan Bruha,et al.  Discretization and Grouping: Preprocessing Steps for Data Mining , 1998, PKDD.

[8]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[13]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[14]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[18]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[19]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.