Comparison of PLS Discriminant Analysis and supervised SOMs for Blood Brain Barrier activity

In the development of drugs compounds suitable for human being, many experiments have to be conducted to ensure drugs safe consumption and generally takes almost 10 to 12 years for a particular drugs to enter the market from laboratory. Therefore, the pattern recognition in QSAR is significant for analyzing the data and developing several necessary models, so that only novel drugs candidate will be synthesized. There are three important aspects for the classification of BBB activity in this work, (1) variable reduction by PCA (2) variable selection and class separation with comparison of three methods such as T-Statistics, Partial Least Squares Regression Coefficient (PLSRC) and newly invented Self Organising Maps Discriminatory Index (SOMDI). and (3) classification, a comparison of linear (PLSDA) and non linear (SuSOMs) methods. The number of PCA component determined by LOO cross-validations is seven. Based on PCA score, the variables selected by T-Statistics and SOMDI are more selective and can provide better separation for BBB activity than PLSRC. Models performances and validations, built through PLSDA and SOMs show that the consensually selected 7 descriptors in this work by using SOMDI, T-statistics and PLSRC were able to classify BBB penetration and non-penetration compounds.

[1]  S. Romeo,et al.  Novel amodiaquine congeners as potent antimalarial agents. , 2008, Bioorganic & medicinal chemistry.

[2]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[3]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[4]  J. DiMasi,et al.  Risks in new drug development: Approval success rates for investigational drugs , 2001, Clinical pharmacology and therapeutics.

[5]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[6]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[8]  Federico Marini,et al.  Artificial neural networks in foodstuff analyses: Trends and perspectives A review. , 2009, Analytica chimica acta.

[9]  Rajarshi Guha,et al.  On the interpretation and interpretability of quantitative structure–activity relationship models , 2008, J. Comput. Aided Mol. Des..

[10]  Paul Geladi,et al.  The start and early history of chemometrics: Selected interviews. Part 1 , 1990 .

[11]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[12]  Paul Geladi,et al.  The start and early history of chemometrics: Selected interviews. Part 2 , 1990 .

[13]  R. Brereton,et al.  Supervised self organizing maps for classification and determination of potentially discriminatory variables: illustrated by application to nuclear magnetic resonance metabolomic profiling. , 2010, Analytical chemistry.

[14]  L. Dumitriu,et al.  Pre-processing aspects for complexity reduction of the QSAR problem , 2008, 2008 4th International IEEE Conference Intelligent Systems.

[15]  W. Krzanowski Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components , 1987 .