PLS in Data Mining and Data Integration

Data mining by means of projection methods such as PLS (projection to latent structures), and their extensions is discussed. The most common data analytical questions in data mining are covered, and illustrated with examples. (a) Clustering, i.e., finding and interpreting “natural” groups in the data (b) Classification and identification, e.g., biologically active compounds vs inactive (c) Quantitative relationships between different sets of variables, e.g., finding variables related to quality of a product, or related to time, seasonal or/and geographical change

[1]  Ing-Marie Olsson,et al.  D-optimal onion designs in statistical molecular design , 2004 .

[2]  Svante Wold,et al.  Modelling and diagnostics of batch processes and analogous kinetic experiments , 1998 .

[3]  Svante Wold,et al.  PLS DISCRIMINANT PLOTS , 1986 .

[4]  Svante Wold,et al.  Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection , 1996 .

[5]  S Wold,et al.  Statistical molecular design of building blocks for combinatorial chemistry. , 2000, Journal of medicinal chemistry.

[6]  Ranjan Maitra,et al.  Clustering Massive Datasets With Application in Software Metrics and Tomography , 2001, Technometrics.

[7]  Olof Svensson,et al.  Classification of Chemically Modified Celluloses Using a Near-Infrared Spectrometer and Soft Independent Modeling of Class Analogies , 1997 .

[8]  Tudor I. Oprea,et al.  Chemography: the Art of Navigating in Chemical Space , 2000 .

[9]  R. Barnes,et al.  Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra , 1989 .

[10]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[11]  Søren Balling Engelsen,et al.  Towards on-line monitoring of the composition of commercial carrageenan powders , 2004 .

[12]  Johan Trygg,et al.  Multi- and Megavariate Data Analysis : Part II: Advanced Applications and Method Extensions , 2006 .

[13]  Daniel Q. Naiman Pattern Recognition in Practice II , 1988 .

[14]  Erik Johansson,et al.  Using chemometrics for navigating in the large data sets of genomics, proteomics, and metabonomics (gpm) , 2004, Analytical and bioanalytical chemistry.

[15]  Tormod Næs,et al.  The flexibility of fuzzy clustering illustrated by examples , 1999 .

[16]  David J. Hand,et al.  Data Mining: Statistics and More? , 1998 .

[17]  Anders Berglund,et al.  PCA and PLS with very large data sets , 2005, Comput. Stat. Data Anal..

[18]  L. Eriksson Multi- and megavariate data analysis , 2006 .

[19]  Lutgarde M. C. Buydens,et al.  Molecular data-mining: a challenge for chemometrics , 1999 .

[20]  Erik Johansson,et al.  Megavariate analysis of hierarchical QSAR data , 2002, J. Comput. Aided Mol. Des..

[21]  Erik Johansson,et al.  On the selection of the training set in environmental QSAR analysis when compounds are clustered , 2000 .

[22]  Hugo Kubinyi,et al.  3D QSAR in drug design : theory, methods and applications , 2000 .

[23]  Svante Wold,et al.  The utility of multivariate design in PLS modeling , 2004 .

[24]  Ing-Marie Olsson,et al.  Controlling coverage of D‐optimal onion designs and selections , 2004 .

[25]  Erik Johansson,et al.  Four levels of pattern recognition , 1978 .

[26]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .