Impacto da geração de grafos na classificação semissupervisionada

A variety of graph-based semi-supervised learning algorithms have been proposed by the research community in the last few years. Despite its apparent empirical success, the field of semi-supervised learning lacks a detailed empirical study that evaluates the influence of graph construction on semisupervised learning. In this work we provide such an empirical study. For such purpose, we combine a variety of graph construction methods with a variety of graph-based semi-supervised learning algorithms in order to empirically compare them in six benchmark data sets widely used in the semi-supervised learning literature. The algorithms are evaluated in tasks about digit, character, text, and image classification as well as classification of gaussian distributions. The experimental evaluation proposed in this work is subdivided into four parts: (1) best case analysis; (2) evaluation of classifiers’ stability; (3) evaluation of the influence of graph construction on semi-supervised learning; (4) evaluation of the influence of regularization parameters on the classification performance of semi-supervised learning algorithms. In the best case analysis, we evaluate the lowest error rates of each semi-supervised learning algorithm combined with the graph construction methods using a variety of sparsification parameter values. Such parameter is associated with the number of neighbors of each training example. In the evaluation of classifiers’ stability, we evaluate the stability of the semi-supervised learning algorithms combined with the graph construction methods using a variety of sparsification parameter values. For such purpose, we fixed the regularization parameter values (if any) with the values that achieved the best result in the best case analysis. In the evaluation of the influence of graph construction, we evaluate the graph construction methods combined with the semi-supervised learning algorithms using a variety of sparsification parameter values. In this analysis, as occurred in the evaluation of classifiers’ stability, we fixed the regularization parameter values (if any) with the values that achieved the best result in the best case analysis. In the evaluation of the influence of regularization parameters on the classification performance of semi-supervised learning algorithms, we evaluate the error surfaces generated by the semi-supervised classifiers in each graph and data set. For such purpose, we fixed the graphs that achieved the best results in the best case analysis and varied the regularization parameters’ values. The intention of our experiments is evaluating iii the trade-off between classification performance and stability of the graphbased semi-supervised learning algorithms in a variety of graph construction methods as well as parameter values (sparsification and regularization, if applicable). From the obtained results, we conclude that the mutual k-nearest neighbors (mutKNN) graph may be the best choice for adjacency graph construction while the RBF kernel may be the best choice for weighted matrix generation. In addition, mutKNN tends to generate error surfaces that are smoother than those generated by other adjacency graph construction methods. However, mutKNN is unstable for relatively small values of k. Our results indicate that the classification performance of the graph-based semi-supervised learning algorithms are heavily influenced by parameter setting. We found just a few evident patterns that could help parameter selection. The consequences of such instability are discussed in this work in research and practice. Sumário Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Sumário . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Lista de Figuras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Lista de Tabelas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introdução 1 1.1 Contribuições . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Discussão dos resultados obtidos . . . . . . . . . . . . . . . . . . . 3 1.3 Consequências dos resultados obtidos . . . . . . . . . . . . . . . . 4 1.4 Organização do Trabalho . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Classificação semissupervisionada baseada em grafos 7 2.1 Preliminares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Geração de grafos de adjacência . . . . . . . . . . . . . . . . . . . . 10 2.3 Geração de matrizes ponderadas . . . . . . . . . . . . . . . . . . . 11 2.4 Classificadores semissupervisionados . . . . . . . . . . . . . . . . 12 2.5 Indução heurística . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Protocolo Experimental 17 3.1 Bases de dados e pré-processamento . . . . . . . . . . . . . . . . . 17 3.2 Configuração experimental . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Configuração de parâmetros . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Considerações Finais . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Análise de melhor caso 23 4.1 Avaliação dos resultados obtidos . . . . . . . . . . . . . . . . . . . 23 4.2 Considerações finais . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 Avaliação da estabilidade dos classificadores semissupervisionados 29 5.1 Descrição do modelo experimental proposto . . . . . . . . . . . . . 29 5.2 Avaliação dos resultados obtidos . . . . . . . . . . . . . . . . . . . 40 5.3 Considerações finais . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 v 6 Avaliação da influência da geração de grafos na classificação semissupervisionada 45 6.1 Descrição do modelo experimental proposto . . . . . . . . . . . . . 45 6.2 Avaliação dos resultados obtidos . . . . . . . . . . . . . . . . . . . 53 6.3 Considerações finais . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 7 Avaliação da influência dos parâmetros de regularização na classificação semissupervisionada 59 7.1 Descrição do modelo experimental proposto . . . . . . . . . . . . . 60 7.2 Avaliação dos resultados obtidos . . . . . . . . . . . . . . . . . . . 85 7.3 Considerações finais . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8 Conclusão 87 Referências Bibliográficas 95 Lista de Figuras 5.1 Padrão de gráfico para a avaliação da estabilidade dos classificadores semissupervisionados. . . . . . . . . . . . . . . . . . . . . . . 30 5.2 Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo symKNN-RBF. 31 5.3 Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo symKNN-RBF. 31 5.4 Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo mutKNN-RBF. 32 5.5 Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo mutKNN-RBF. 32 5.6 Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo symFKNN-RBF. 33 5.7 Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo symFKNNRBF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.8 Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo symKNN-HM. 34 5.9 Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo symKNN-HM. 34 5.10Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo mutKNN-HM. 35 5.11Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo mutKNN-HM. 35 5.12Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo symFKNN-HM. 36 5.13Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo symFKNN-HM. 36 5.14Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo symKNN-LLE. 37 5.15Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo symKNN-LLE. 37 5.16Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo mutKNN-LLE. 38 vii 5.17Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo mutKNN-LLE. 38 5.18Estabilidade dos classificadores semissupervisionados para as partições de 10 exemplos rotulados usando o grafo symFKNN-LLE. 39 5.19Estabilidade dos classificadores semissupervisionados para as partições de 100 exemplos rotulados usando o grafo symFKNNLLE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.1 Padrão de gráfico para a avaliação da influência da geração de grafos na classificação semissupervisionada. . . . . . . . . . . . . 46 6.2 Avaliação dos métodos de geração de grafos na base de dados USPS usando as partições de 10 exemplos rotulados. . . . . . . . 47 6.3 Avaliação dos métodos de geração de grafos na base de dados USPS usando as partições de 100 exemplos rotulados. . . . . . . 47 6.4 Avaliação dos métodos de geração de grafos na base de dados COIL2 usando as partições de 10 exemplos rotulados. . . . . . . . 48 6.5 Avaliação dos métodos de geração de grafos na base de dados COIL2 usando as partições de 100 exemplos rotulados. . . . . . . 48 6.6 Avaliação dos métodos de geração de grafos na base de dados DIGIT-1 usando as partições de 10 exemplos rotulados. . . . . . . 49 6.7 Avaliação dos métodos de geração de grafos na base de dados DIGIT-1 usando as partições de 100 exemplos rotulados. . . . . . 49 6.8 Avaliação dos métodos de geração de g

[1]  Daniel A. Spielman,et al.  Fitting a graph to vector data , 2009, ICML '09.

[2]  Matti Kääriäinen,et al.  Generalization Error Bounds Using Unlabeled Data , 2005, COLT.

[3]  Shih-Fu Chang,et al.  Learning with Partially Absorbing Random Walks , 2012, NIPS.

[4]  Wei Liu,et al.  Robust multi-class transductive learning with graphs , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jun Wang,et al.  Fast Graph Construction Using Auction Algorithm , 2012, UAI.

[6]  Xinhua Zhang,et al.  Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms , 2006, NIPS.

[7]  Tong Zhang,et al.  On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning , 2007, J. Mach. Learn. Res..

[8]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[9]  Feiping Nie,et al.  Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach , 2012, NIPS.

[10]  Celso André R. de Sousa,et al.  Influence of Graph Construction on Semi-supervised Learning , 2013, ECML/PKDD.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Bert Huang,et al.  Fast b-matching via Sufficient Selection Belief Propagation , 2011, AISTATS.

[13]  Mehryar Mohri,et al.  Stability of transductive regression algorithms , 2008, ICML '08.

[14]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[15]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[16]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[17]  Nenghai Yu,et al.  Non-negative low rank and sparse graph for semi-supervised learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[19]  Celso André Rodrigues de Sousa,et al.  Analysis of the backpropagation algorithm using linear algebra , 2012, IJCNN.

[20]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[21]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[22]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[23]  Hamid R. Rabiee,et al.  Isograph: Neighbourhood Graph Construction Based on Geodesic Distance for Semi-supervised Learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[24]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[25]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[26]  Rong Jin,et al.  Semi-Supervised Learning by Mixed Label Propagation , 2007, AAAI.

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[29]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[30]  Michael R. Lyu,et al.  Can irrelevant data help semi-supervised learning, why and how? , 2011, CIKM '11.

[31]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[32]  Feiping Nie,et al.  Unsupervised and semi-supervised learning via ℓ1-norm graph , 2011, 2011 International Conference on Computer Vision.

[33]  Mark Herbster,et al.  Combining Graph Laplacians for Semi-Supervised Learning , 2005, NIPS.

[34]  Tong Zhang,et al.  Analysis of Spectral Kernel Design based Semi-supervised Learning , 2005, NIPS.

[35]  Xiaobo Zhou,et al.  Active microscopic cellular image annotation by superposable graph transduction with imbalanced labels , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[37]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[38]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[39]  Ulrike von Luxburg,et al.  Graph Laplacians and their Convergence on Random Neighborhood Graphs , 2006, J. Mach. Learn. Res..

[40]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[41]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[42]  Shai Ben-David,et al.  Access to Unlabeled Data can Speed up Prediction Time , 2011, ICML.

[43]  Yuji Matsumoto,et al.  Using the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data , 2011, CoNLL.

[44]  Matthias Hein,et al.  Manifold Denoising , 2006, NIPS.

[45]  Kaizhu Huang,et al.  Fast and Robust Graph-based Transductive Learning via Minimum Tree Cut , 2011, 2011 IEEE 11th International Conference on Data Mining.

[46]  Bert Huang,et al.  Loopy Belief Propagation for Bipartite Maximum Weight b-Matching , 2007, AISTATS.

[47]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[48]  Feiping Nie,et al.  An Iterative Locally Linear Embedding Algorithm , 2012, ICML.

[49]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[50]  Ulrike von Luxburg,et al.  Cluster Identification in Nearest-Neighbor Graphs , 2007, ALT.

[51]  Ran El-Yaniv,et al.  Error Bounds for Transductive Learning via Compression and Clustering , 2003, NIPS.

[52]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[53]  Ulrike von Luxburg,et al.  Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters , 2009, Theor. Comput. Sci..