Partial correlation metric based classifier for food product characterization.

Data classification algorithms applied to multidimensional and multiclass food characterization problems mainly assume feature independency to quantify intra-class similarity or inter-class dissimilarities. As an alternative, possible class specific inter-relations among the feature vectors can be exploited for distinguishing samples into specific classes. Based on this idea, a new partial correlation coefficient metric (PCCM) based classification method is proposed. Existence of such inter-variable correlations as signatures of unique classes is established with illustrative problems. Categorized variable dependency structures are hypothesized as the basis for class discrimination. Two food quality analysis datasets with chemometrics importance are utilized as benchmark problems to compare the performance of new method with classification algorithms like LDA (linear discriminant analysis), CART, Treenet and SVM (support vector machines). The PCCM method is observed to perform well for different tests over large sets of classification experiments. Discriminating PCCM classifier also provides a quick visualization tool to diagnose complex classification problems.

[1]  N. H. Timm Applied Multivariate Analysis , 2002 .

[2]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[3]  Kuriakose Athappilly,et al.  A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models , 2005, Expert Syst. Appl..

[4]  K. Héberger,et al.  Supervised pattern recognition in food analysis. , 2007, Journal of chromatography. A.

[5]  Raghuraj Rao,et al.  Variable interaction network based variable selection for multivariate calibration. , 2007, Analytica chimica acta.

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  S. A. Salah,et al.  Feature extraction and classification of Chilean wines , 2006 .

[8]  S. Lakshminarayanan,et al.  Partial correlation based variable selection approach for multivariate data classification methods , 2007 .

[9]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[10]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[11]  R. Shibata,et al.  PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE , 2004 .

[12]  Antonio Rizzi,et al.  An alternative approach to HACCP system implementation , 2007 .

[13]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[14]  Cesare Furlanello,et al.  Modern data mining tools in descriptive sensory analysis: A case study with a Random forest approach , 2007 .

[15]  Vadlamani Ravi,et al.  Software reliability prediction by soft computing techniques , 2008, J. Syst. Softw..

[16]  Jürgen Kurths,et al.  Observing and Interpreting Correlations in Metabolic Networks , 2003, Bioinform..

[17]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[18]  T. B. Murphy,et al.  A comparison of model-based and regression classification techniques applied to near infrared spectroscopic data in food authentication studies , 2007 .

[19]  David J. Brown Using a global VNIR soil-spectral library for local soil characterization and landscape modeling in a 2nd-order Uganda watershed , 2007 .

[20]  David G. Stork,et al.  Pattern Classification , 1973 .

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[22]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  L. Everis,et al.  Use of survival analysis and Classification and Regression Trees to model the growth/no growth boundary of spoilage yeasts as affected by alcohol, pH, sucrose, sorbate and temperature. , 2004, International journal of food microbiology.

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  Yukio Tominaga,et al.  Comparative study of class data analysis with PCA-LDA, SIMCA, PLS, ANNs, and k-NN , 1999 .