Semi-supervised Learning

[1]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[2]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[3]  Alexander Gammerman,et al.  Learning by Transduction , 1998, UAI.

[4]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[5]  Adam R. Klivans,et al.  Learning intersections and thresholds of halfspaces , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[6]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[8]  Mehryar Mohri,et al.  Rational Kernels , 2002, NIPS.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[11]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[12]  Nikhil Bansal,et al.  Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[13]  E. Kushilevitz,et al.  Learning by distances , 1990, COLT '90.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  E. Nadaraya On Estimating Regression , 1964 .

[16]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[17]  Dale Schuurmans,et al.  Metric-Based Methods for Adaptive Model Selection and Regularization , 2002, Machine Learning.

[18]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[19]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[20]  Maria-Florina Balcan,et al.  An Augmented PAC Model for Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[21]  Andrew W. Moore,et al.  Fast Robust Logistic Regression for Large Sparse Datasets with Binary Outputs , 2003, AISTATS.

[22]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[23]  S. Ganesalingam Classification and Mixture Approaches to Clustering Via Maximum Likelihood , 1989 .

[24]  Ran El-Yaniv,et al.  Error Bounds for Transductive Learning via Compression and Clustering , 2003, NIPS.

[25]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[26]  Li Liao,et al.  Combining pairwise sequence similarity and support vector machines for remote protein homology detection , 2002, RECOMB '02.

[27]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[28]  Adrian Corduneanu,et al.  Distributed Information Regularization on Graphs , 2004, NIPS.

[29]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[30]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[31]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[32]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[33]  Matthias W. Seeger,et al.  Covariance Kernels from Bayesian Generative Models , 2001, NIPS.

[34]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[35]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[36]  L. Goldstein,et al.  Optimal Plug-in Estimators for Nonparametric Functional Estimation , 1992 .

[37]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[38]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[39]  Jon M. Kleinberg,et al.  Detecting a network failure , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[40]  D. M. Titterington,et al.  Updating a Diagnostic System using Unconfirmed Cases , 1976 .

[41]  M. Kearns Efficient noise-tolerant learning from statistical queries , 1998, JACM.

[42]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[43]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[44]  John C. Platt Fast Embedding of Sparse Similarity Graphs , 2003, NIPS.

[45]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[46]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[47]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[48]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[49]  Yoshua Bengio,et al.  Model Selection for Small Sample Regression , 2002, Machine Learning.

[50]  M. Seeger Input-dependent Regularization of Conditional Density Models , 2000 .

[51]  Ayhan Demiriz,et al.  Exploiting unlabeled data in ensemble methods , 2002, KDD.

[52]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[53]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[54]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[55]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[56]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[57]  Nello Cristianini,et al.  Convex Methods for Transduction , 2003, NIPS.

[58]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[59]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[60]  R. Shibata An optimal selection of regression variables , 1981 .

[61]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[62]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[63]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[64]  M. Yamasaki Ideal boundary limit of discrete Dirichlet functions , 1986 .

[65]  Alexander Gammerman,et al.  Machine-Learning Applications of Algorithmic Randomness , 1999, ICML.

[66]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[67]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[68]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[69]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[70]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[71]  Geoffrey C. Fox,et al.  Vector quantization by deterministic annealing , 1992, IEEE Trans. Inf. Theory.

[72]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[73]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[74]  Santosh S. Vempala,et al.  A random sampling based algorithm for learning the intersection of half-spaces , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[75]  Nicu Sebe,et al.  Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[77]  S. Rosenberg The Laplacian on a Riemannian Manifold: The Laplacian on a Riemannian Manifold , 1997 .

[78]  Ulrike von Luxburg,et al.  Limits of Spectral Clustering , 2004, NIPS.

[79]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[80]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.

[81]  I. Jolliffe Principal Component Analysis , 2002 .

[82]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[83]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[84]  Byron Dom,et al.  An Information-Theoretic External Cluster-Validity Measure , 2002, UAI.

[85]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[86]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[87]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[88]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[89]  Mikhail Bilenko and Sugato Basu A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields , 2004 .

[90]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[91]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[92]  R. Berk,et al.  Limiting Behavior of Posterior Distributions when the Model is Incorrect , 1966 .

[93]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[94]  Bernhard Schölkopf,et al.  Support vector channel selection in BCI , 2004, IEEE Transactions on Biomedical Engineering.

[95]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[96]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[97]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .

[98]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[99]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[100]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[101]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[102]  Dale Schuurmans,et al.  Characterizing the generalization performance of model selection strategies , 1997, ICML.

[103]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[104]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[105]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[106]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[107]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[108]  D.C. St. Clair,et al.  SeMi-supervised adaptive resonance theory (SMART2) , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[109]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[110]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[111]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[112]  Alon Orlitsky,et al.  Estimating and computing density based distance metrics , 2005, ICML.

[113]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[114]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[115]  Terence J. O'Neill Normal Discrimination with Unclassified Observations , 1978 .

[116]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[117]  Raymond A. Board,et al.  Semi-Supervised Learning , 1989, Machine Learning.

[118]  Thomas Hofmann,et al.  Statistical Models for Co-occurrence Data , 1998 .

[119]  Shaoning Pang,et al.  Transductive support vector machines and applications in bioinformatics for promoter recognition , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[120]  Nicolas Chapados,et al.  Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[121]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[122]  Karsten A. Verbeurgt Learning DNF under the uniform distribution in quasi-polynomial time , 1990, COLT '90.

[123]  H. Akaike A new look at the statistical model identification , 1974 .

[124]  D. Titterington,et al.  Estimation Problems with Data from a Mixture , 1978 .

[125]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[126]  G. McLachlan,et al.  The efficiency of a linear discriminant function based on unclassified initial samples , 1978 .

[127]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[128]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[129]  Santosh S. Venkatesh,et al.  Learning from a mixture of labeled and unlabeled examples with parametric side information , 1995, COLT '95.

[130]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[131]  Tobias Scheffer,et al.  Using Transduction and Multi-view Learning to Answer Emails , 2003, PKDD.

[132]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[133]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[134]  G. McLachlan Iterative Reclassification Procedure for Constructing An Asymptotically Optimal Rule of Allocation in Discriminant-Analysis , 1975 .

[135]  Adrian Corduneanu,et al.  Continuation Methods for Mixing Heterogenous Sources , 2002, UAI.

[136]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[137]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[138]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[139]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[140]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[141]  Paul A. Viola,et al.  Unsupervised improvement of visual detectors using cotraining , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[142]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[143]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[144]  Douglas L. Brutlag,et al.  Remote homology detection: a motif based approach , 2003, ISMB.

[145]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[146]  Joachim M. Buhmann,et al.  Learning with constrained and unlabelled data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[147]  G. McLachlan,et al.  Updating a discriminant function in basis of unclassified data , 1982 .

[148]  Ke Wang,et al.  Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[149]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[150]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[151]  Aleksandrs Slivkins,et al.  Network failure detection and graph connectivity , 2004, SODA '04.

[152]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[153]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[154]  Franck Davoine,et al.  Expressive face recognition and synthesis , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[155]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[156]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[157]  Lawrence K. Saul,et al.  Analysis and extension of spectral methods for nonlinear dimensionality reduction , 2005, ICML.

[158]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[159]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[160]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[161]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[162]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[163]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[164]  Christina S. Leslie,et al.  Fast Kernels for Inexact String Matching , 2003, COLT.

[165]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[166]  Nello Cristianini,et al.  Kernel methods for exploratory data analysis: a demonstration on text data , 2004 .

[167]  Lawrence Carin,et al.  Semi-Supervised Classification , 2004, Encyclopedia of Database Systems.

[168]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[169]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[170]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[171]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[172]  Matthias Hein,et al.  Intrinsic dimensionality estimation of submanifolds in Rd , 2005, ICML.

[173]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[174]  B. Efron Computers and the Theory of Statistics: Thinking the Unthinkable , 1979 .

[175]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[176]  Matthias Hein,et al.  Measure Based Regularization , 2003, NIPS.

[177]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[178]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[179]  M. Degroot Optimal Statistical Decisions , 1970 .

[180]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[181]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[182]  Kilian Q. Weinberger,et al.  Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization , 2005, AISTATS.

[183]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[184]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[185]  Leslie Lamport,et al.  How to Write a Proof , 1995 .

[186]  Tijl De Bie,et al.  Eigenproblems in Pattern Recognition , 2005 .

[187]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[188]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[189]  Y. Abu-Mostafa Machines that Learn from Hints , 1995 .

[190]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[191]  Yoshua Bengio,et al.  Greedy Spectral Embedding , 2005, AISTATS.

[192]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[193]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[194]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[195]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[196]  Avrim Blum,et al.  Learning an intersection of k halfspaces over a uniform distribution , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[197]  Byoung-Tak Zhang,et al.  Large Scale Unstructured Document Classification Using Unlabeled Data and Syntactic Information , 2003, PAKDD.

[198]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[199]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[200]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[201]  Lei Wang,et al.  Bootstrapping SVM active learning by incorporating unlabelled images for image retrieval , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[202]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[203]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[204]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[205]  John Langford,et al.  PAC-MDL Bounds , 2003, COLT.

[206]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[207]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[208]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[209]  Balázs Kégl,et al.  Boosting on Manifolds: Adaptive Regularization of Base Classifiers , 2004, NIPS.

[210]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[211]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[212]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[213]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[214]  Alexander Gammerman,et al.  Transduction with Confidence and Credibility , 1999, IJCAI.

[215]  D. Hosmer A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions Under Three Different Types of Sample , 1973 .

[216]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[217]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[218]  Dan Roth,et al.  Understanding Probabilistic Classifiers , 2001, ECML.

[219]  D. W. Scott,et al.  Nonparametric Estimation of Probability Densities and Regression Curves , 1988 .

[220]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[221]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[222]  Shahar Mendelson,et al.  Random Subclass Bounds , 2003, COLT.

[223]  William Stafford Noble,et al.  Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[224]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[225]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[226]  B. Alberts,et al.  An Introduction to the Molecular Biology of the Cell , 1998 .

[227]  J. Heinonen,et al.  Nonlinear Potential Theory of Degenerate Elliptic Equations , 1993 .

[228]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[229]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[230]  Boaz Leskes,et al.  The Value of Agreement, a New Boosting Algorithm , 2005, COLT.

[231]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[232]  Geoffrey C. Fox,et al.  A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[233]  Bernhard Schölkopf,et al.  Feature selection and transduction for prediction of molecular bioactivity for drug design , 2003, Bioinform..

[234]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[235]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[236]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[237]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[238]  Rayid Ghani,et al.  Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.

[239]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[240]  Santosh S. Vempala,et al.  Optimal outlier removal in high-dimensional spaces , 2004, J. Comput. Syst. Sci..

[241]  Naonori Ueda,et al.  Deterministic Annealing Variant of the EM Algorithm , 1994, NIPS.

[242]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[243]  Yoshihiro Yamanishi,et al.  Supervised Graph Inference , 2004, NIPS.

[244]  G. McLachlan,et al.  Small sample results for a linear discriminant function estimated from a mixture of normal populations , 1979 .

[245]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[246]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[247]  Miguel F. Anjos,et al.  New Convex Relaxations for the Maximum Cut and VLSI Layout Problems , 2001 .

[248]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[249]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[250]  J. Anderson Multivariate logistic compounds , 1979 .

[251]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[252]  Vikas Sindhwani,et al.  On Manifold Regularization , 2005, AISTATS.

[253]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[254]  Rayid Ghani,et al.  Combining labeled and unlabeled data for text classification with a large number of categories , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[255]  Joachim M. Buhmann,et al.  Clustering with the Connectivity Kernel , 2003, NIPS.

[256]  Eric B. Baum,et al.  Polynomial time algorithms for learning neural nets , 1990, Annual Conference Computational Learning Theory.

[257]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[258]  Gene H. Golub,et al.  Matrix computations , 1983 .

[259]  H. O. Hartley,et al.  Classification and Estimation in Analysis of Variance Problems , 1968 .

[260]  ASHOK K. AGRAWALA,et al.  Learning with a probabilistic teacher , 1970, IEEE Trans. Inf. Theory.

[261]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[262]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[263]  Weiru Liu,et al.  Learning belief networks from data: an information theory based approach , 1997, CIKM '97.

[264]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[265]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[266]  Philip M. Long,et al.  Performance guarantees for hierarchical clustering , 2002, J. Comput. Syst. Sci..

[267]  G. McLachlan Estimating the Linear Discriminant Function from Initial Samples Containing a Small Number of Unclassified Observations , 1977 .

[268]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[269]  L. Csató Gaussian processes:iterative sparse approximations , 2002 .

[270]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[271]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[272]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[273]  Jason Weston,et al.  Multi-class protein fold recognition using adaptive codes , 2005, ICML.

[274]  Cullen Schaffer Overfitting avoidance as bias , 2004, Machine Learning.

[275]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[276]  Tommi S. Jaakkola,et al.  Information Regularization with Partially Labeled Data , 2002, NIPS.

[277]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[278]  Olivier Chapelle,et al.  Model Selection for Support Vector Machines , 1999, NIPS.

[279]  Andrew McCallum,et al.  Semi-Supervised Clustering with User Feedback , 2003 .

[280]  Peter Sollich Probabilistic interpretations and Bayesian methods for support vector machines , 1999 .

[281]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[282]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[283]  Nicu Sebe,et al.  Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[284]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[285]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[286]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[287]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[288]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[289]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[290]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[291]  Jean-Philippe Vert,et al.  Graph-Driven Feature Extraction From Microarray Data Using Diffusion Kernels and Kernel CCA , 2002, NIPS.

[292]  Russell Greiner,et al.  Model Selection Criteria for Learning Belief Nets: An Empirical Comparison , 2000, ICML.

[293]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[294]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[295]  T. Cover,et al.  The relative value of labeled and unlabeled samples in pattern recognition , 1993, Proceedings. IEEE International Symposium on Information Theory.

[296]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.

[297]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[298]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[299]  Massih-Reza Amini,et al.  Semi Supervised Logistic Regression , 2002, ECAI.

[300]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[301]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[302]  James A. Sethian,et al.  Level Set Methods and Fast Marching Methods , 1999 .

[303]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[304]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[305]  Johan A. K. Suykens,et al.  Learning from General Label Constraints , 2004, SSPR/SPR.

[306]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[307]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[308]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[309]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[310]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[311]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[312]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[313]  Klaus Obermayer,et al.  Bayesian Transduction , 1999, NIPS.

[314]  Jason Weston,et al.  Transductive Inference for Estimating Values of Functions , 1999, NIPS.

[315]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[316]  K. Bennett,et al.  Optimization Approaches to Semi-Supervised Learning , 2001 .

[317]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[318]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[319]  Inderjit S. Dhillon,et al.  Information theoretic clustering of sparse cooccurrence data , 2003, Third IEEE International Conference on Data Mining.

[320]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[321]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[322]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[323]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[324]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[325]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[326]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[327]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[328]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[329]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[330]  Jean-Michel Renders,et al.  Combining Labelled and Unlabelled Data: A Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition , 2002, CoNLL.

[331]  Mikhail Belkin,et al.  Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[332]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[333]  Nello Cristianini,et al.  Efficiently Learning the Metric with Side-Information , 2003, ALT.

[334]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[335]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[336]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[337]  Nello Cristianini,et al.  Spectral Kernel Methods for Clustering , 2001, NIPS.

[338]  O. Mangasarian,et al.  Semi-Supervised Support Vector Machines for Unlabeled Data Classification , 2001 .

[339]  Stephen P. Boyd,et al.  The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem , 2006, SIAM Rev..

[340]  N Linial,et al.  ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space , 1999, Proteins.

[341]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[342]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[343]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..