Evolution of PLS for Modeling SAR and omics Data

In quantitative structure‐activity relationship (QSAR), multivariate statistical methods are commonly used for data analysis. Partial least squares (PLS) is of particular interest because it can analyze data with strongly collinear, noisy and numerous X variables, and also simultaneously models several activity variables Y. PLS provides several prediction regions and diagnostic plots as statistical measures. PLS has evolved for coping with the severe demands imposed by complex data structures. In this review article, we outline the algorithms of five advanced PLS techniques and provide some representative examples of each. The selected models are Nonlinear PLS, Multiway PLS, Hierarchical PLS, Orthogonal PLS, and Bi‐modal PLS. Studies of particular aspects of living cells (such as the set of genes or proteins in the cell and their interactions) are collectively known as the ‐omics fields. Omics integrate heterogeneous scientific disciplines and include chemogenomics, proteomics, and metabolomics. The datasets produced within the omics fields are numerous, megavariate and extremely complex. The data structures are frequently incomplete, noisy, nonlinear and collinear demanding modern and powerful multivariate data analysis methods. In particular, the omics technologies have steered biology towards the adoption of orthogonal PLS. We also describe future prospects for the use of PLS algorithms in the omics fields.

[1]  Hiroshi Yoshida,et al.  Optimization of the Inner Relation Function of QPLS Using Genetic Algorithm , 1997, J. Chem. Inf. Comput. Sci..

[2]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[3]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .

[4]  Tommy Löfstedt,et al.  Bi‐modal OnPLS , 2012 .

[5]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[6]  Erik Johansson,et al.  Megavariate Analysis of Environmental QSAR Data. Part II – Investigating Very Complex Problem Formulations Using Hierarchical, Non-Linear and Batch-Wise Extensions of PCA and PLS , 2006, Molecular Diversity.

[7]  Magdalena Bacilieri,et al.  Autocorrelation of molecular electrostatic potential surface properties combined with partial least squares analysis as new strategy for the prediction of the activity of human A(3) adenosine receptor antagonists. , 2005, Journal of medicinal chemistry.

[8]  Bruce L. Bush,et al.  Sample-distance partial least squares: PLS optimized for many variables, with application to CoMFA , 1993, J. Comput. Aided Mol. Des..

[9]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[10]  Kimito Funatsu,et al.  The Recent Trend in QSAR Modeling - Variable Selection and 3D-QSAR Methods , 2007 .

[11]  Jaroslaw Polanski,et al.  Drug design using comparative molecular surface analysis , 2006, Expert opinion on drug discovery.

[12]  O. Silakari,et al.  Exploring three-dimensional quantitative structural activity relationship (3D-QSAR) analysis of SCH 66336 (Sarasar) analogues of farnesyltransferase inhibitors. , 2008, European journal of medicinal chemistry.

[13]  Shin-ichi Sasaki,et al.  Chemical pattern recognition and multivariate analysis for QSAR studies , 1993 .

[14]  R. Bro Multiway calibration. Multilinear PLS , 1996 .

[15]  Rasmus Bro,et al.  Improving the speed of multi-way algorithms:: Part I. Tucker3 , 1998 .

[16]  Anton J. Hopfinger,et al.  Free-energy force-field three-dimensional quantitative structure–activity relationship analysis of a set of p38-mitogen activated protein kinase inhibitors , 2006, Journal of molecular modeling.

[17]  Magni Martens,et al.  Chapter 16 Three-Block Data Modeling by Endo- and Exo-LPLS Regression , 2010 .

[18]  Jean-Pierre Doucet,et al.  Nonlinear SVM Approaches to QSPR/QSAR Studies and Drug Design , 2007 .

[19]  S. Wold Nonlinear partial least squares modelling II. Spline inner relation , 1992 .

[20]  S. Wold,et al.  The GIFI approach to non‐linear PLS modeling , 2001 .

[21]  S Wold,et al.  Three-block bi-focal PLS (3BIF-PLS) and its application in QSAR , 2004, SAR and QSAR in environmental research.

[22]  Emilio Xavier Esposito,et al.  Categorical QSAR models for skin sensitization based on local lymph node assay measures and both ground and excited state 4D-fingerprint descriptors , 2008, J. Comput. Aided Mol. Des..

[23]  Daniel Eriksson,et al.  Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data. , 2007, The Plant journal : for cell and molecular biology.

[24]  3D-QSAR studies of Checkpoint Kinase Weel inhibitors based on molecular docking, CoMFA and CoMSIA. , 2008, European journal of medicinal chemistry.

[25]  C. Hansch Quantitative approach to biochemical structure-activity relationships , 1969 .

[26]  S. Wold,et al.  INLR, implicit non‐linear latent variable regression , 1997 .

[27]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[28]  Peter Gedeck,et al.  Exploiting QSAR models in lead optimization. , 2008, Current opinion in drug discovery & development.

[29]  Lutgarde M. C. Buydens,et al.  SOMPLS: A supervised self-organising map--partial least squares algorithm for multivariate regression problems , 2007 .

[30]  Cornel Catana,et al.  Novel, Customizable Scoring Functions, Parameterized Using N-PLS, for Structure-Based Drug Discovery , 2007, J. Chem. Inf. Model..

[31]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[32]  J. Zou,et al.  QSPR models for the physicochemical properties of halogenated methyl-phenyl ethers. , 2008, Journal of molecular graphics & modelling.

[33]  Gersende Fort,et al.  Classification using partial least squares with penalized logistic regression , 2005, Bioinform..

[34]  Vladyslav Kholodovych,et al.  3D-QSAR comparative molecular field analysis on opioid receptor antagonists: pooling data from different studies. , 2005, Journal of medicinal chemistry.

[35]  S. Wold,et al.  Some recent developments in PLS modeling , 2001 .

[36]  Johan Trygg,et al.  Advantages of orthogonal inspection in chemometrics , 2012 .

[37]  Jörg Henseler,et al.  Handbook of Partial Least Squares: Concepts, Methods and Applications , 2010 .

[38]  Kimito Funatsu,et al.  Non-linear modeling and chemical interpretation with aid of support vector machine and regression. , 2010, Current computer-aided drug design.

[39]  L. Zhi-liang,et al.  Three-dimensional holographic vector of atomic interaction field for quantitative structure-activity relationship of Aza-bioisosteres of anthrapyrazoles (Aza-APs). , 2008, Journal of molecular graphics & modelling.

[40]  C W Yap,et al.  Regression methods for developing QSAR and QSPR models to predict compounds of specific pharmacodynamic, pharmacokinetic and toxicological properties. , 2007, Mini reviews in medicinal chemistry.

[41]  Kimito Funatsu,et al.  Nonlinear Partial Least Squares Modeling of Phenyl Alkylamines with the Monoamine Oxidase Inhibitory Activities , 1996, J. Chem. Inf. Comput. Sci..

[42]  D. Mcewen Backflushing and Two-Stage Operation of Capillary Columns in Gas Chromatography. , 1964 .

[43]  Lars Olsen,et al.  QSAR Models for the Human H+/Peptide Symporter, hPEPT1: Affinity Prediction Using Alignment-Independent Descriptors , 2008, J. Chem. Inf. Model..

[44]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[45]  E. Martin,et al.  Non-linear projection to latent structures revisited: the quadratic PLS algorithm , 1999 .

[46]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[47]  V. Ravichandran,et al.  Comparative molecular similarity indices analysis for predicting anti-HIV activity of phenyl ethyl thiourea (PET) derivatives , 2008, Medicinal Chemistry Research.

[48]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[49]  Anders Berglund,et al.  Hierarchical PLS Modeling for Predicting the Binding of a Comprehensive Set of Structurally Diverse Protein-Ligand Complexes , 2006, J. Chem. Inf. Model..

[50]  Shuling Liu,et al.  Three‐Dimensional Holographic Vector of Atomic Interaction Field Applied in QSAR of Anti‐HIV HEPT Analogues , 2008 .

[51]  V. Papadopoulos,et al.  3D QSAR studies of AChE inhibitors based on molecular docking scores and CoMFA. , 2006, Bioorganic & medicinal chemistry letters.

[52]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[53]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[54]  清 長谷川,et al.  Visualization and Chemical Interpretation of Multi-Target Structure-Activity Relationships Using SOMPLS , 2011 .

[55]  Tommy Löfstedt,et al.  OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation , 2011 .

[56]  S. Wold,et al.  The kernel algorithm for PLS , 1993 .

[57]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[58]  K Hasegawa,et al.  Partial Least Squares Modeling and Genetic Algorithm Optimization in Quantitative Structure-Activity Relationships , 2000, SAR and QSAR in environmental research.

[59]  S. Wold,et al.  Nonlinear PLS modeling , 1989 .

[60]  Yan-shen Guo,et al.  Exploration of a binding mode of indole amide analogues as potent histone deacetylase inhibitors and 3D-QSAR analyses. , 2005, Bioorganic & medicinal chemistry.

[61]  Erik Johansson,et al.  Orthogonal PLS (OPLS) Modeling for Improved Analysis and Interpretation in Drug Design , 2012, Molecular informatics.

[62]  Kimito Funatsu,et al.  Quantitative Structure−Activity Relationships of the Synthetic Substrates for Elastase Enzyme Using Nonlinear Partial Least Squares Regression , 1996 .

[63]  Y. Heyden,et al.  Chemometric analysis of soil pollution data using the Tucker N-way method , 2006, Analytical and bioanalytical chemistry.

[64]  J. Trygg O2‐PLS for qualitative and quantitative analysis in multivariate calibration , 2002 .

[65]  Kimito Funatsu,et al.  Multi-way PLS modeling of structure-activity data by incorporating electrostatic and lipophilic potentials on molecular surface , 2003, Comput. Biol. Chem..

[66]  Kimito Funatsu,et al.  New Molecular Surface-based 3D-QSAR Method using Kohonen Neural Network and 3-way PLS , 2002, Comput. Chem..