A novel multi-view classifier based on Nyström approximation

The existing multi-view learning (MVL) is learning from patterns with multiple information sources and has been proven its superior generalization to the conventional single-view learning (SVL). However, in most real-world cases, researchers just have single source patterns available in which the existing MVL is uneasily directly applied. The purpose of this paper is to solve this problem and develop a novel kernel-based MVL technique for single source patterns. In practice, we first generate different Nystrom approximation matrices K"ps for the gram matrix G of the given single source patterns. Then, we regard the learning on each generated Nystrom approximation matrix K"p as one view. Finally, different views on K"ps are synthesized into a novel multi-view classifier. In doing so, the proposed algorithm as a MVL machine can directly work on single source patterns and simultaneously achieve: (1) low-cost learning; (2) effectiveness; (3) the same Rademacher complexity as the single-view KMHKS; (4) ease of extension to any other kernel-based learning algorithms.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[3]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.

[4]  J. Łȩski Kernel Ho-Kashyap classifier with generalization control , 2004 .

[5]  Terry Windeatt,et al.  Ensemble MLP Classifier Design , 2008, Computational Intelligence Paradigms.

[6]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[7]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Ivor W. Tsang,et al.  Efficient kernel feature extraction for massive data sets , 2006, KDD '06.

[10]  Terry Windeatt,et al.  Accuracy/Diversity and Ensemble MLP Classifier Design , 2006, IEEE Transactions on Neural Networks.

[11]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[12]  Shahar Mendelson,et al.  Rademacher averages and phase transitions in Glivenko-Cantelli classes , 2002, IEEE Trans. Inf. Theory.

[13]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[14]  Robert P. W. Duin,et al.  Object Representation, Sample Size, and Data Set Complexity , 2006 .

[15]  Yan Zhou,et al.  Democratic co-learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[16]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[17]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[18]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[19]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[20]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[21]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[22]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[23]  P. Matsakis,et al.  The use of force histograms for affine-invariant relative position description , 2004 .

[24]  Liviu Badea,et al.  Generalized Clustergrams for Overlapping Biclusters , 2009, IJCAI.

[25]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[26]  Ravi Kothari,et al.  A Classification Paradigm for Distributed Vertically Partitioned Data , 2004, Neural Computation.

[27]  T. Ho,et al.  Data Complexity in Pattern Recognition , 2006 .

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[30]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[31]  Yoh-Han Pao,et al.  The ensemble approach to neural-network learning and generalization , 1999, IEEE Trans. Neural Networks.

[32]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[33]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[34]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[35]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[36]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[37]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[38]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[39]  Su-Yun Huang,et al.  Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[40]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[41]  Juan-Zi Li,et al.  Feature-Correlation Based Multi-view Detection , 2005, ICCSA.