论文信息 - Semi-supervised Learning

Semi-supervised Learning

[5] Adam R. Klivans,et al. Learning intersections and thresholds of halfspaces , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[6] Ann B. Lee,et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7] Raymond J. Mooney,et al. A probabilistic framework for semi-supervised clustering , 2004, KDD.

[8] Mehryar Mohri,et al. Rational Kernels , 2002, NIPS.

[9] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[11] G. Wahba. Smoothing noisy data with spline functions , 1975 .

[12] Nikhil Bansal,et al. Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[13] E. Kushilevitz,et al. Learning by distances , 1990, COLT '90.

[14] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[15] E. Nadaraya. On Estimating Regression , 1964 .

[16] Nello Cristianini,et al. Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[17] Dale Schuurmans,et al. Metric-Based Methods for Adaptive Model Selection and Regularization , 2002, Machine Learning.

[18] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[19] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[20] Maria-Florina Balcan,et al. An Augmented PAC Model for Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[21] Andrew W. Moore,et al. Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs , 2003, AISTATS.

[22] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[23] S. Ganesalingam. Classification and Mixture Approaches to Clustering Via Maximum Likelihood , 1989 .

[24] Ran El-Yaniv,et al. Error Bounds for Transductive Learning via Compression and Clustering , 2003, NIPS.

[25] Maria-Florina Balcan,et al. Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[26] Li Liao,et al. Combining pairwise sequence similarity and support vector machines for remote protein homology detection , 2002, RECOMB '02.

[27] Avrim Blum,et al. Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[28] Adrian Corduneanu,et al. Distributed Information Regularization on Graphs , 2004, NIPS.

[29] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[30] Tom M. Mitchell,et al. Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[31] Inderjit S. Dhillon,et al. Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[32] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[33] Matthias W. Seeger,et al. Covariance Kernels from Bayesian Generative Models , 2001, NIPS.

[34] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[35] G. S. Watson,et al. Smooth regression analysis , 1964 .

[36] L. Goldstein,et al. Optimal Plug-in Estimators for Nonparametric Functional Estimation , 1992 .

[37] Jerome H. Friedman,et al. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[38] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[39] Jon M. Kleinberg,et al. Detecting a network failure , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[40] D. M. Titterington,et al. Updating a Diagnostic System using Unconfirmed Cases , 1976 .

[41] M. Kearns. Efficient noise-tolerant learning from statistical queries , 1998, JACM.

[42] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[43] H. J. Scudder,et al. Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[44] John C. Platt. Fast Embedding of Sparse Similarity Graphs , 2003, NIPS.

[45] Raymond J. Mooney,et al. Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[46] Yoram Singer,et al. Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[47] Brian D. Ripley,et al. Pattern Recognition and Neural Networks , 1996 .

[48] Léon Bottou,et al. Local Learning Algorithms , 1992, Neural Computation.

[49] Yoshua Bengio,et al. Model Selection for Small Sample Regression , 2002, Machine Learning.

[50] M. Seeger. Input-dependent Regularization of Conditional Density Models , 2000 .

[51] Ayhan Demiriz,et al. Exploiting unlabeled data in ensemble methods , 2002, KDD.

[52] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[53] Andreas Stolcke,et al. Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[54] G. McLachlan. Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[55] H. Zha,et al. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[56] Vittorio Castelli,et al. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[57] Nello Cristianini,et al. Convex Methods for Transduction , 2003, NIPS.

[58] Fabio Gagliardi Cozman,et al. Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[59] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[60] R. Shibata. An optimal selection of regression variables , 1981 .

[61] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[62] Christopher K. I. Williams. Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[63] Nello Cristianini,et al. A statistical framework for genomic data fusion , 2004, Bioinform..

[64] M. Yamasaki. Ideal boundary limit of discrete Dirichlet functions , 1986 .

[65] Alexander Gammerman,et al. Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[66] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[67] Jason Weston,et al. Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[68] Olivier Bousquet,et al. On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[69] Geoffrey E. Hinton,et al. Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[70] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[71] Geoffrey C. Fox,et al. Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[72] Tatsuya Akutsu,et al. Protein homology detection using string alignment kernels , 2004, Bioinform..

[73] David Maxwell Chickering,et al. Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[74] Santosh S. Vempala,et al. A random sampling based algorithm for learning the intersection of half-spaces , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[75] Nicu Sebe,et al. Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76] Vladimir Koltchinskii,et al. Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[77] S. Rosenberg. The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .

[78] Ulrike von Luxburg,et al. Limits of Spectral Clustering , 2004, NIPS.

[79] David A. Landgrebe,et al. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[80] Ting Chen,et al. An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.

[81] I. Jolliffe. Principal Component Analysis , 2002 .

[82] James R. Knight,et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[83] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[84] Byron Dom,et al. An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[85] Andrew B. Kahng,et al. New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[86] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.

[87] T. Takagi,et al. Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[88] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[89] Mikhail Bilenko and Sugato Basu. A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields , 2004 .

[90] H. White. Maximum Likelihood Estimation of Misspecified Models , 1982 .

[91] Jason Weston,et al. Vicinal Risk Minimization , 2000, NIPS.

[92] R. Berk,et al. Limiting Behavior of Posterior Distributions when the Model is Incorrect , 1966 .

[93] Andrew McCallum,et al. Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[94] Bernhard Schölkopf,et al. Support vector channel selection in BCI , 2004, IEEE Transactions on Biomedical Engineering.

[95] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[96] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[97] Christopher J. C. Burges,et al. Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .

[98] Shang-Hua Teng,et al. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[99] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .

[100] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[101] Nir Friedman,et al. The Bayesian Structural EM Algorithm , 1998, UAI.

[102] Dale Schuurmans,et al. Characterizing the generalization performance of model selection strategies , 1997, ICML.

[103] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[104] Tomer Hertz,et al. Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[105] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[106] David D. Lewis,et al. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[107] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[108] D.C. St. Clair,et al. SeMi-supervised adaptive resonance theory (SMART2) , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[109] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[110] A. Agresti,et al. Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[111] J. Besag. On the Statistical Analysis of Dirty Pictures , 1986 .

[112] Alon Orlitsky,et al. Estimating and computing density based distance metrics , 2005, ICML.

[113] Thorsten Joachims,et al. Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[114] T Poggio,et al. Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[115] Terence J. O'Neill. Normal Discrimination with Unclassified Observations , 1978 .

[116] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[117] Raymond A. Board,et al. Semi-Supervised Learning , 1989, Machine Learning.

[118] Thomas Hofmann,et al. Statistical Models for Co-occurrence Data , 1998 .

[119] Shaoning Pang,et al. Transductive support vector machines and applications in bioinformatics for promoter recognition , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[120] Nicolas Chapados,et al. Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[121] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[122] Karsten A. Verbeurgt. Learning DNF under the uniform distribution in quasi-polynomial time , 1990, COLT '90.

[123] H. Akaike. A new look at the statistical model identification , 1974 .

[124] D. Titterington,et al. Estimation Problems with Data from a Mixture , 1978 .

[125] Dan Klein,et al. From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[126] G. McLachlan,et al. The efficiency of a linear discriminant function based on unclassified initial samples , 1978 .

[127] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[128] John Langford,et al. Cover trees for nearest neighbor , 2006, ICML.

[129] Santosh S. Venkatesh,et al. Learning from a mixture of labeled and unlabeled examples with parametric side information , 1995, COLT '95.

[130] R. Redner,et al. Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[131] Tobias Scheffer,et al. Using Transduction and Multi-view Learning to Answer Emails , 2003, PKDD.

[132] Alan L. Yuille,et al. Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[133] D. Donoho,et al. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[134] G. McLachlan. Iterative Reclassification Procedure for Constructing An Asymptotically Optimal Rule of Allocation in Discriminant-Analysis , 1975 .

[135] Adrian Corduneanu,et al. Continuation Methods for Mixing Heterogenous Sources , 2002, UAI.

[136] D. Haussler,et al. Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[137] Yaniv Ziv,et al. Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[138] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.

[139] A G Murzin,et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[140] Nicolas Le Roux,et al. Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[141] Paul A. Viola,et al. Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[142] Noam Nisan,et al. Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[143] Alexander J. Smola,et al. Kernels and Regularization on Graphs , 2003, COLT.

[144] Douglas L. Brutlag,et al. Remote homology detection: a motif based approach , 2003, ISMB.

[145] Sebastian Thrun,et al. Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[146] Joachim M. Buhmann,et al. Learning with constrained and unlabelled data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[147] G. McLachlan,et al. Updating a discriminant function in basis of unclassified data , 1982 .

[148] Ke Wang,et al. Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[149] P. J. Huber. The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[150] Nicola J. Rinaldi,et al. Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[151] Aleksandrs Slivkins,et al. Network failure detection and graph connectivity , 2004, SODA '04.

[152] Bernhard Schölkopf,et al. Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[153] S. Sathiya Keerthi,et al. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[154] Franck Davoine,et al. Expressive face recognition and synthesis , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[155] Peter G. Doyle,et al. Random Walks and Electric Networks: REFERENCES , 1987 .

[156] B. Efron. The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[157] Lawrence K. Saul,et al. Analysis and extension of spectral methods for nonlinear dimensionality reduction , 2005, ICML.

[158] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[159] Pedro M. Domingos,et al. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[160] Alexander J. Smola,et al. Fast Kernels for String and Tree Matching , 2002, NIPS.

[161] Michael Gribskov,et al. Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[162] Thorsten Joachims,et al. Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[163] Athanasios Papoulis,et al. Probability, Random Variables and Stochastic Processes , 1965 .

[164] Christina S. Leslie,et al. Fast Kernels for Inexact String Matching , 2003, COLT.

[165] Vladimir Vapnik,et al. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[166] Nello Cristianini,et al. Kernel methods for exploratory data analysis: a demonstration on text data , 2004 .

[167] Lawrence Carin,et al. Semi-Supervised Classification , 2004, Encyclopedia of Database Systems.

[168] Shailesh V. Date,et al. A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[169] Nicolas Le Roux,et al. Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[170] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[171] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[172] Matthias Hein,et al. Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[173] G. McLachlan,et al. The EM algorithm and extensions , 1996 .

[174] B. Efron. Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[175] Adrian Corduneanu,et al. On Information Regularization , 2002, UAI.

[176] Matthias Hein,et al. Measure Based Regularization , 2003, NIPS.

[177] Prasad Tadepalli,et al. Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[178] Ayhan Demiriz,et al. Semi-Supervised Support Vector Machines , 1998, NIPS.

[179] M. Degroot. Optimal Statistical Decisions , 1970 .

[180] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[181] Alan M. Frieze,et al. A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[182] Kilian Q. Weinberger,et al. Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization , 2005, AISTATS.

[183] Rayid Ghani,et al. Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[184] B. Schwikowski,et al. A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[185] Leslie Lamport,et al. How to Write a Proof , 1995 .

[186] Tijl De Bie,et al. Eigenproblems in Pattern Recognition , 2005 .

[187] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[188] Rong Jin,et al. Learning with Multiple Labels , 2002, NIPS.

[189] Y. Abu-Mostafa. Machines that Learn from Hints , 1995 .

[190] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[191] Yoshua Bengio,et al. Greedy Spectral Embedding , 2005, AISTATS.

[192] G. Celeux,et al. A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[193] D. Pe’er,et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[194] B. Snel,et al. Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[195] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[196] Avrim Blum,et al. Learning an intersection of k halfspaces over a uniform distribution , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[197] Byoung-Tak Zhang,et al. Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information , 2003, PAKDD.

[198] Lawrence K. Saul,et al. Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[199] David Haussler,et al. A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[200] Tommi S. Jaakkola,et al. Partially labeled classification with Markov random walks , 2001, NIPS.

[201] Lei Wang,et al. Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[202] Kilian Q. Weinberger,et al. Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[203] R. Tibshirani,et al. Generalized additive models for medical research , 1986, Statistical methods in medical research.

[204] Richard E. Blahut,et al. Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[205] John Langford,et al. PAC-MDL Bounds , 2003, COLT.

[206] Anders Krogh,et al. Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[207] Dan Roth,et al. On the Hardness of Approximate Reasoning , 1993, IJCAI.

[208] Andrew W. Moore,et al. 'N-Body' Problems in Statistical Learning , 2000, NIPS.

[209] Balázs Kégl,et al. Boosting on Manifolds: Adaptive Regularization of Base Classifiers , 2004, NIPS.

[210] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[211] Bernhard Schölkopf,et al. A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[212] Michael Ruogu Zhang,et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[213] Arindam Banerjee,et al. Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[214] Alexander Gammerman,et al. Transduction with Confidence and Credibility , 1999, IJCAI.

[215] D. Hosmer. A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions Under Three Different Types of Sample , 1973 .

[216] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .

[217] Cullen Schaffer,et al. A Conservation Law for Generalization Performance , 1994, ICML.

[218] Dan Roth,et al. Understanding Probabilistic Classifiers , 2001, ECML.

[219] D. W. Scott,et al. Nonparametric Estimation of Probability Densities and Regression Curves , 1988 .

[220] Ben Taskar,et al. Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[221] Van Rijsbergen,et al. A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[222] Shahar Mendelson,et al. Random Subclass Bounds , 2003, COLT.

[223] William Stafford Noble,et al. Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[224] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[225] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[226] B. Alberts,et al. An Introduction to the Molecular Biology of the Cell , 1998 .

[227] J. Heinonen,et al. Nonlinear Potential Theory of Degenerate Elliptic Equations , 1993 .

[228] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[229] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[230] Boaz Leskes,et al. The Value of Agreement, a New Boosting Algorithm , 2005, COLT.

[231] Sanjoy Dasgupta,et al. PAC Generalization Bounds for Co-training , 2001, NIPS.

[232] Geoffrey C. Fox,et al. A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[233] Bernhard Schölkopf,et al. Feature selection and transduction for prediction of molecular bioactivity for drug design , 2003, Bioinform..

[234] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[235] G J Barton,et al. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[236] Risi Kondor,et al. Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[237] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval , 1972 .

[238] Rayid Ghani,et al. Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.

[239] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[240] Santosh S. Vempala,et al. Optimal outlier removal in high-dimensional spaces , 2004, J. Comput. Syst. Sci..

[241] Naonori Ueda,et al. Deterministic Annealing Variant of the EM Algorithm , 1994, NIPS.

[242] Yoshua Bengio,et al. Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[243] Yoshihiro Yamanishi,et al. Supervised Graph Inference , 2004, NIPS.

[244] G. McLachlan,et al. Small sample results for a linear discriminant function estimated from a mixture of normal populations , 1979 .

[245] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[246] Pascal Vincent,et al. Non-Local Manifold Parzen Windows , 2005, NIPS.

[247] Miguel F. Anjos,et al. New Convex Relaxations for the Maximum Cut and VLSI Layout Problems , 2001 .

[248] Alessandro Vespignani,et al. Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[249] Dan Klein,et al. Spectral Learning , 2003, IJCAI.

[250] J. Anderson. Multivariate logistic compounds , 1979 .

[251] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[252] Vikas Sindhwani,et al. On Manifold Regularization , 2005, AISTATS.

[253] Leslie G. Valiant,et al. A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[254] Rayid Ghani,et al. Combining labeled and unlabeled data for text classification with a large number of categories , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[255] Joachim M. Buhmann,et al. Clustering with the Connectivity Kernel , 2003, NIPS.

[256] Eric B. Baum,et al. Polynomial time algorithms for learning neural nets , 1990, Annual Conference Computational Learning Theory.

[257] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .

[258] Gene H. Golub,et al. Matrix computations , 1983 .

[259] H. O. Hartley,et al. Classification and Estimation in Analysis of Variance Problems , 1968 .

[260] ASHOK K. AGRAWALA,et al. Learning with a probabilistic teacher , 1970, IEEE Trans. Inf. Theory.

[261] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[262] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[263] Weiru Liu,et al. Learning belief networks from data: an information theory based approach , 1997, CIKM '97.

[264] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .

[265] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[266] Philip M. Long,et al. Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[267] G. McLachlan. Estimating the Linear Discriminant Function from Initial Samples Containing a Small Number of Unclassified Observations , 1977 .

[268] Yishay Mansour,et al. An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[269] L. Csató. Gaussian processes:iterative sparse approximations , 2002 .

[270] Mikhail Belkin,et al. Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[271] David J. Miller,et al. A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[272] Nir Friedman,et al. Bayesian Network Classifiers , 1997, Machine Learning.

[273] Jason Weston,et al. Multi-class protein fold recognition using adaptive codes , 2005, ICML.

[274] Cullen Schaffer. Overfitting avoidance as bias , 2004, Machine Learning.

[275] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[276] Tommi S. Jaakkola,et al. Information Regularization with Partially Labeled Data , 2002, NIPS.

[277] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[278] Olivier Chapelle,et al. Model Selection for Support Vector Machines , 1999, NIPS.

[279] Andrew McCallum,et al. Semi-Supervised Clustering with User Feedback , 2003 .

[280] Peter Sollich. Probabilistic interpretations and Bayesian methods for support vector machines , 1999 .

[281] Nicolas Le Roux,et al. Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[282] Éva Tardos,et al. Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[283] Nicu Sebe,et al. Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[284] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[285] S. Boucheron,et al. A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[286] Jos F. Sturm,et al. A Matlab toolbox for optimization over symmetric cones , 1999 .

[287] Alexander Zien,et al. Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[288] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[289] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[290] Jitendra Malik,et al. Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[291] Jean-Philippe Vert,et al. Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA , 2002, NIPS.

[292] Russell Greiner,et al. Model Selection Criteria for Learning Belief Nets: An Empirical Comparison , 2000, ICML.

[293] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[294] Fabio Gagliardi Cozman,et al. Semi-Supervised Learning of Mixture Models , 2003, ICML.

[295] T. Cover,et al. The relative value of labeled and unlabeled samples in pattern recognition , 1993, Proceedings. IEEE International Symposium on Information Theory.

[296] Yoshua Bengio,et al. Non-Local Manifold Tangent Learning , 2004, NIPS.

[297] Thomas Hofmann,et al. Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[298] Claire Cardie,et al. Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[299] Massih-Reza Amini,et al. Semi Supervised Logistic Regression , 2002, ECAI.

[300] Joshua B. Tenenbaum,et al. Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[301] Bernhard Schölkopf,et al. Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[302] James A. Sethian,et al. Level Set Methods and Fast Marching Methods , 1999 .

[303] Bernhard Schölkopf,et al. Dynamic Alignment Kernels , 2000 .

[304] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[305] Johan A. K. Suykens,et al. Learning from General Label Constraints , 2004, SSPR/SPR.

[306] R. Mooney,et al. Impact of Similarity Measures on Web-page Clustering , 2000 .

[307] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[308] Stephen P. Boyd,et al. Semidefinite Programming , 1996, SIAM Rev..

[309] Takeo Kanade,et al. Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[310] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[311] Ulrike von Luxburg,et al. Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[312] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[313] Klaus Obermayer,et al. Bayesian Transduction , 1999, NIPS.

[314] Jason Weston,et al. Transductive Inference for Estimating Values of Functions , 1999, NIPS.

[315] Yair Weiss,et al. Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[316] K. Bennett,et al. Optimization Approaches to Semi-Supervised Learning , 2001 .

[317] C. J. Stone,et al. Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[318] David W. Opitz,et al. Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[319] Inderjit S. Dhillon,et al. Information theoretic clustering of sparse cooccurrence data , 2003, Third IEEE International Conference on Data Mining.

[320] J. M. Hammersley,et al. Markov fields on finite graphs and lattices , 1971 .

[321] Mikhail Belkin,et al. Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[322] Mikhail Belkin,et al. Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[323] Tom M. Mitchell,et al. Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[324] David B. Shmoys,et al. A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[325] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[326] Matthew Brand,et al. Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[327] Nello Cristianini,et al. Classification using String Kernels , 2000 .