Decision tree technique applied to pig farming datasets

Abstract The decision tree technique is an effective instrument for making large datasets accessible and different sow herd data comparable. This technique can be used to improve the detected differences and weak points in farm management. The calculation of the decision tree based on the C4.5-algorithm, which constructs trees in a top down recursive strategy. It calculates the ranking of the attributes within the tree with the gain ratio criterion. In datasets from two Northern German sow herds, the decision tree technique was used to generate such trees and exemplarily classify the binary farmer decision regarding replacing or not replacing a sow with a gilt. The datasets were accumulated for farm A between 1994 and 2001 and for farm B within a time span from 1984 to 1999. The datasets consisted of 14,897 and 21,818 observations from an average of 386, respectively, 484 sows per year. The C4.5-decision tree algorithm generated decision trees, which showed different sizes ranging from 15 to 55 nodes. In relation to the sow herd performance, the threshold values of the attributes at the branches varied between the trees. The sensitivity, the kappa value and the error rate were the evaluation parameters for estimating the algorithms performance in classifying the present unbalanced datasets. For both datasets the results reached a sensitivity value of 39.2–47.3% and the kappa value showed a rate between 44.9% and 53.9%. The error rate varied between 14.2% and 19.2%. Additionally, the datasets were modified in order to identify the reasons for partially false classification. The datasets were reduced by excluding the replacement decisions, such as aggressive sows, which are not reflected in the sow reproduction performance. Henceforth, the sensitivity reached values of between 69.4% and 87.5% and the error rate (10.1–15.0%) decreased.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[3]  R. Morris,et al.  Management and environmental factors associated with annual sow culling rate: A path analysis , 1989 .

[4]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[5]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Y. Kroes,et al.  Reproductive lifetime of sows in relation to economy of production , 1979 .

[8]  Ruud B.M. Huirne,et al.  Quantifying Economic Benefits of Sow-Herd Management Information Systems Using Panel Data , 1995 .

[9]  A. Dijkhuizen,et al.  Sow culling and mortality in commercial swine breeding herds. , 1990 .

[10]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[11]  Jiawei Han,et al.  Generalization and decision tree induction: efficient classification in data mining , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[12]  J Dagorn,et al.  Sow culling: Reasons for and effect on productivity , 1979 .

[13]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[14]  Lloyd A. Smith,et al.  An investigation into the use of machine learning for determining oestrus in cows , 1996 .

[15]  K. M. Wade,et al.  Performance analysis for machine-learning experiments using small data sets , 2003 .

[16]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  R. Lacroix,et al.  Induction and evaluation of decision trees for lactation curve analysis , 2003 .

[19]  J. Verstegen Economic value of management information systems in pig farming. , 1998 .

[20]  Ruud B.M. Huirne,et al.  Information needs and information technology on dairy farms , 1999 .

[21]  Lawrence B. Holder,et al.  Intermediate Decision Trees , 1995, IJCAI.

[22]  A. A. Dijkhuizen,et al.  Sow replacement: A comparison of farmers' actual decisions and model recommendations , 1989 .

[23]  Robert J. McQueen,et al.  Applying machine learning to agricultural data , 1995 .