论文信息 - A novel multi-view classifier based on Nyström approximation

A novel multi-view classifier based on Nyström approximation

The existing multi-view learning (MVL) is learning from patterns with multiple information sources and has been proven its superior generalization to the conventional single-view learning (SVL). However, in most real-world cases, researchers just have single source patterns available in which the existing MVL is uneasily directly applied. The purpose of this paper is to solve this problem and develop a novel kernel-based MVL technique for single source patterns. In practice, we first generate different Nystrom approximation matrices K"ps for the gram matrix G of the given single source patterns. Then, we regard the learning on each generated Nystrom approximation matrix K"p as one view. Finally, different views on K"ps are synthesized into a novel multi-view classifier. In doing so, the proposed algorithm as a MVL machine can directly work on single source patterns and simultaneously achieve: (1) low-cost learning; (2) effectiveness; (3) the same Rademacher complexity as the single-view KMHKS; (4) ease of extension to any other kernel-based learning algorithms.

[1] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[2] Ulrich H.-G. Kreßel,et al. Pairwise classification and support vector machines , 1999 .

[3] Zhi-Hua Zhou,et al. Analyzing Co-training Style Algorithms , 2007, ECML.

[4] J. Łȩski. Kernel Ho-Kashyap classifier with generalization control , 2004 .

[5] Terry Windeatt,et al. Ensemble MLP Classifier Design , 2008, Computational Intelligence Paradigms.

[6] Rayid Ghani,et al. Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[7] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] David G. Stork,et al. Pattern Classification , 1973 .

[9] Ivor W. Tsang,et al. Efficient kernel feature extraction for massive data sets , 2006, KDD '06.

[10] Terry Windeatt,et al. Accuracy/Diversity and Ensemble MLP Classifier Design , 2006, IEEE Transactions on Neural Networks.

[11] Francis K. H. Quek,et al. Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[12] Shahar Mendelson,et al. Rademacher averages and phase transitions in Glivenko-Cantelli classes , 2002, IEEE Trans. Inf. Theory.

[13] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.

[14] Robert P. W. Duin,et al. Object Representation, Sample Size, and Data Set Complexity , 2006 .

[15] Yan Zhou,et al. Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[16] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[17] V. Koltchinskii,et al. Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[18] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2004 .

[19] John Shawe-Taylor,et al. Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[20] Ameet Talwalkar,et al. Ensemble Nystrom Method , 2009, NIPS.

[21] Vladimir Koltchinskii,et al. Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[22] Peter L. Bartlett,et al. Model Selection and Error Estimation , 2000, Machine Learning.

[23] P. Matsakis,et al. The use of force histograms for affine-invariant relative position description , 2004 .

[24] Liviu Badea,et al. Generalized Clustergrams for Overlapping Biclusters , 2009, IJCAI.

[25] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[26] Ravi Kothari,et al. A Classification Paradigm for Distributed Vertically Partitioned Data , 2004, Neural Computation.

[27] T. Ho,et al. Data Complexity in Pattern Recognition , 2006 .

[28] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[29] Giorgio Valentini,et al. Ensembles of Learning Machines , 2002, WIRN.

[30] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[31] Yoh-Han Pao,et al. The ensemble approach to neural-network learning and generalization , 1999, IEEE Trans. Neural Networks.

[32] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[33] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[34] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[35] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[36] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[37] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[38] Craig A. Knoblock,et al. Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[39] Su-Yun Huang,et al. Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[40] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[41] Juan-Zi Li,et al. Feature-Correlation Based Multi-view Detection , 2005, ICCSA.