Clustering-Based Weighted Extreme Learning Machine for Classification in Drug Discovery Process

Extreme Learning Machine (ELM) is a universal approximation method that is extremely fast and easy to implement, but the weights of the model are normally randomly selected so they can lead to poor prediction performance. In this work, we applied Weighted Similarity Extreme Learning Machine in combination with Jaccard/Tanimoto (WELM-JT) and cluster analysis (namely, k-means clustering and Support Vector Clustering) on similarity and distance measures (i.e., Jaccard/Tanimoto and Euclidean) in order to predict which compounds with not-so-different chemical structures have an activity for treating a certain symptom or disease. The proposed method was experimented on one of the most challenging datasets named Maximum Unbiased Validation (MUV) dataset with 4 different types of fingerprints (i.e. ECFP_4, ECFP_6, FCFP_4 and FCFP_6). The experimental results show that WELM-JT in combination with k-means-ED gave the best performance. It retrieved the highest number of active molecules and used the lowest number of nodes. Meanwhile, WELM-JT with k-means-JT and ECFP_6 encoding proved to be a robust contender for most of the activity classes.

[1]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[2]  Kitsuchart Pasupa,et al.  A coefficient comparison of weighted similarity extreme learning machine for drug screening , 2016, 2016 8th International Conference on Knowledge and Smart Technology (KST).

[3]  Peter Willett,et al.  Effectiveness of 2D fingerprints for scaffold hopping. , 2011, Future medicinal chemistry.

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[5]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[6]  Sebastian G. Rohrer,et al.  Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data , 2009, J. Chem. Inf. Model..

[7]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[8]  Kitsuchart Pasupa,et al.  Virtual Screening Using Binary Kernel Discrimination: Effect of Noisy Training Data and the Optimization of Performance , 2006, J. Chem. Inf. Model..

[9]  Y Z Chen,et al.  Identifying Novel Type ZBGs and Nonhydroxamate HDAC Inhibitors Through a SVM Based Virtual Screening Approach , 2010, Molecular informatics.

[10]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[11]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[12]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[13]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Wojciech Czarnecki,et al.  Weighted Tanimoto Extreme Learning Machine with Case Study in Drug Discovery , 2015, IEEE Computational Intelligence Magazine.