Computer-assisted prediction of pesticide substructure using mass spectra.

Mass spectral classifiers of 16 substructures that are present in basic structures of pesticides have been investigated to assist pesticide residues analysis as well as screening of pesticide lead compounds. Mass spectral data are first transformed into 396 features, and then Genetic Algorithm-Partial Least Squares (GA-PLS) as a feature selection method and Support Vector Machine (SVM) as a validation method are implemented together to get an optimization feature set for each substructure. At last, a statistical method which is AdaBoost algorithm combined with Classification and Regression Tree (AdaBoost-CART) is trained to predict the 16 substructures presence/absence using the optimization mass spectral feature set. It is demonstrated that the optimum feature sets can be used to predict the 16 pesticide substructures presence/absence with mostly 85-100% in recognition success rate instead of the original 396 features.

[1]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[2]  J. C. D. Silva,et al.  Chemometric interpretation of pesticide occurence in soil samples from an intensive horticulture area in north Portugal , 2006 .

[3]  R. Fussell,et al.  Comparison of ultra-performance liquid chromatography and high-performance liquid chromatography for the determination of priority pesticides in baby foods by tandem quadrupole mass spectrometry. , 2006, Journal of chromatography. A.

[4]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[5]  A. G. Frenich,et al.  Monitoring of pesticides in agricultural water and soil samples from Andalusia by liquid chromatography coupled to mass spectrometry , 2005 .

[6]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[7]  Yi-Zeng Liang,et al.  Improving the classification accuracy in chemistry via boosting technique , 2004 .

[8]  P. Harrington,et al.  Prediction of substructure and toxicity of pesticides with temperature constrained-cascade correlation network from low-resolution mass spectra. , 1999, Analytical chemistry.

[9]  Menghui H. Zhang,et al.  Application of boosting to classification problems in chemometrics , 2005 .

[10]  Daniel Cozzolino,et al.  Usefulness of chemometrics and mass spectrometry-based electronic nose to classify Australian white wines by their varietal origin. , 2005, Talanta.

[11]  P. Harrington,et al.  Screening GC-MS data for carbamate pesticides with temperature-constrained–cascade correlation neural networks , 2000 .

[12]  Bärbel Vieth,et al.  Residue analysis of 500 high priority pesticides: better by GC-MS or LC-MS/MS? , 2006, Mass spectrometry reviews.

[13]  Yi-Zeng Liang,et al.  A generalized boosting algorithm and its application to two-class chemical classification problem , 2005 .

[14]  K. Varmuza,et al.  Feature selection by genetic algorithms for mass spectral classifiers , 2001 .

[15]  Kurt Varmuza,et al.  Mass Spectral Classifiers for Supporting Systematic Structure Elucidation , 1996, J. Chem. Inf. Comput. Sci..

[16]  Benjamin Görlach,et al.  Economic Assessment of Groundwater Protection: A Survey of the Literature Final Report , 2003 .