Subset selection from multi-experiment data sets with application to milk fatty acid profiles

The development of routine analyses to allow for the handling of large amounts of samples and to avoid cost and time expensive analytical techniques is of high value. These routine analyses most often require calibration using the detailed analyses as reference values. A representative subset reflecting the complete range of the variables of interest is required for this purpose. In this paper this subset selection problem is tackled for multi-experiment data sets. Conventional techniques such as the Kennard and Stone algorithm and OptiSim are compared to a new approach based on Genetic Algorithms. The challenge here is to find an adequate objective function and to modify the standard crossover and mutation operators to keep the number of desired samples fixed. These techniques are applied on a data set containing the concentration of 45 fatty acids, determined by a simplified reference method, in 1033 milk samples, stemming from six different experiments. The objective is to select a subset of 100 samples in which each of the six different experiments is sufficiently represented. While there is no obvious way to generalize the conventional methods for multi-experiment data sets, this can quite easily be accomplished for Genetic Algorithms by modifying the objective function. Our results indicate that Genetic Algorithms are very capable of handling the subset selection problem for multi-experiment data sets.

[1]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[2]  Salvatore Torquato,et al.  New Conjectural Lower Bounds on the Optimal Density of Sphere Packings , 2006, Exp. Math..

[3]  Robert D. Clark,et al.  OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets , 1997, J. Chem. Inf. Comput. Sci..

[4]  V. Fievez,et al.  Influence of damaging and wilting red clover on lipid metabolism during ensiling and in vitro rumen incubation. , 2010, Animal : an international journal of animal bioscience.

[5]  V. Fievez,et al.  Factors affecting odd- and branched-chain fatty acids in milk: A review , 2006 .

[6]  B. De Baets,et al.  Effect of lactation stage on the odd- and branched-chain milk fatty acids of dairy cattle under grazing and indoor conditions. , 2008, Journal of dairy science.

[7]  V. Fievez,et al.  Short communication: elevated concentrations of oleic acid and long-chain fatty acids in milk fat of multiparous subclinical ketotic cows. , 2008, Journal of dairy science.

[8]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[9]  Yukio Tominaga,et al.  Representative subset selection using genetic algorithms , 1998 .

[10]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[11]  R. Dewhurst,et al.  Use of odd and branched-chain fatty acids in rumen contents and milk as a potential microbial marker. , 2005, Journal of dairy science.

[12]  B. De Baets,et al.  Effect of induction of subacute ruminal acidosis on milk fat profile and rumen parameters. , 2010, Journal of dairy science.

[13]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[14]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[15]  Subbarao Kambhampati,et al.  Evolutionary Computing , 1997, Lecture Notes in Computer Science.

[16]  Francisco Herrera,et al.  Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability , 2007, Data Knowl. Eng..

[17]  Desire L. Massart,et al.  Representative subset selection , 2002 .

[18]  P. Marriott,et al.  Comprehensive two‐dimensional gas chromatography for the separation of fatty acids in milk , 2007 .

[19]  C. Cruz-Hernandez,et al.  Combining Results of Two GC Separations Partly Achieves Determination of All cis and trans 16:1, 18:1, 18:2 and 18:3 Except CLA Isomers of Milk Fat as Demonstrated Using Ag-Ion SPE Fractionation , 2008, Lipids.

[20]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[21]  J. Dijkstra,et al.  Effect of dietary starch or micro algae supplementation on rumen fermentation and milk fatty acid composition of dairy cows. , 2008, Journal of dairy science.