Machine Learning with Squared-Loss Mutual Information
暂无分享,去创建一个
[1] Colin Fyfe,et al. Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.
[2] M. Kawanabe,et al. Direct importance estimation for covariate shift adaptation , 2008 .
[3] Qing Wang,et al. Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.
[4] Dale Schuurmans,et al. Maximum Margin Clustering , 2004, NIPS.
[5] Fraser,et al. Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.
[6] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[7] R. H. Moore,et al. Regression Graphics: Ideas for Studying Regressions Through Graphics , 1998, Technometrics.
[8] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.
[9] Masashi Sugiyama,et al. Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis , 2011, AAAI.
[10] J. Friedman,et al. Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .
[11] Masashi Sugiyama,et al. Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation , 2012 .
[12] Bernhard Schölkopf,et al. Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.
[13] Jacob Goldberger,et al. Nonparametric Information Theoretic Clustering Algorithm , 2010, ICML.
[14] Mark A. Girolami,et al. Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.
[15] Bernhard Schölkopf,et al. Nonlinear causal discovery with additive noise models , 2008, NIPS.
[16] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.
[17] V. A. Epanechnikov. Non-Parametric Estimation of a Multivariate Probability Density , 1969 .
[18] M. C. Jones,et al. Robust and efficient estimation by minimising a density power divergence , 1998 .
[19] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[20] Robert A. Lordo,et al. Nonparametric and Semiparametric Models , 2005, Technometrics.
[21] Xiangrong Yin,et al. Canonical correlation analysis based on information theory , 2004 .
[22] Masashi Sugiyama,et al. Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation , 2010, Neural Computation.
[23] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[24] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[25] Le Song,et al. A Kernel Statistical Test of Independence , 2007, NIPS.
[26] Thomas M. Cover,et al. Elements of information theory (2. ed.) , 2006 .
[27] Takafumi Kanamori,et al. Density-Difference Estimation , 2012, Neural Computation.
[28] Thomas Gärtner,et al. A survey of kernels for structured data , 2003, SKDD.
[29] Andrzej Cichocki,et al. A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.
[30] Masashi Sugiyama,et al. A computationally-efficient alternative to kernel logistic regression , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.
[31] Masashi Sugiyama,et al. Cross-Domain Object Matching with Model Selection , 2011, AISTATS.
[32] Alexander J. Smola,et al. Learning with kernels , 1998 .
[33] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[34] Masashi Sugiyama,et al. Least-Squares Independence Test , 2011, IEICE Trans. Inf. Syst..
[35] Aapo Hyvärinen,et al. A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..
[36] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.
[37] R. Cook. Save: a method for dimension reduction and graphics in regression , 2000 .
[38] Tony Jebara,et al. Kernelizing Sorting, Permutation, and Alignment for Minimum Volume PCA , 2004, COLT.
[39] S. Saigal,et al. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.
[40] Larry D. Hostetler,et al. The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.
[41] Takafumi Kanamori,et al. Density Ratio Estimation in Machine Learning , 2012 .
[42] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.
[43] David Barber,et al. Kernelized Infomax Clustering , 2005, NIPS.
[44] Takafumi Kanamori,et al. Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011, Neural Computation.
[45] H. Hotelling. Relations Between Two Sets of Variates , 1936 .
[46] David Heckerman,et al. Learning Gaussian Networks , 1994, UAI.
[47] Takafumi Kanamori,et al. Least-squares two-sample test , 2011, Neural Networks.
[48] M. Patriksson. Nonlinear Programming and Variational Inequality Problems , 1999 .
[49] Takafumi Kanamori,et al. Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..
[50] Igor Vajda,et al. Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.
[51] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .
[52] Ker-Chau Li,et al. Sliced Inverse Regression for Dimension Reduction , 1991 .
[53] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[54] Johan A. K. Suykens,et al. Kernel Canonical Correlation Analysis and Least Squares Support Vector Machines , 2001, ICANN.
[55] Shotaro Akaho,et al. A kernel method for canonical correlation analysis , 2006, ArXiv.
[56] Michael I. Jordan,et al. Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[57] Takafumi Kanamori,et al. A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..
[58] Motoaki Kawanabe,et al. Dimensionality reduction for density ratio estimation in high-dimensional spaces , 2010, Neural Networks.
[59] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[60] Marc M. Van Hulle,et al. Sequential Fixed-Point ICA Based on Mutual Information Minimization , 2008, Neural Computation.
[61] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.
[62] Takafumi Kanamori,et al. $f$ -Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models , 2010, IEEE Transactions on Information Theory.
[63] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[64] Takafumi Kanamori,et al. Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.
[65] Le Song,et al. A dependence maximization view of clustering , 2007, ICML '07.
[66] Ingo Steinwart,et al. On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..
[67] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .
[68] Karl Pearson F.R.S.. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .
[69] Masashi Sugiyama,et al. Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation , 2009, J. Mach. Learn. Res..
[70] Aapo Hyv. Fast and Robust Fixed-Point Algorithms for Independent Component Analysis , 1999 .
[71] Takafumi Kanamori,et al. Mutual information estimation reveals global associations between stimuli and biological processes , 2009, BMC Bioinformatics.
[72] Le Song,et al. Kernelized Sorting , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[73] M. V. Van Hulle. Edgeworth approximation of multivariate differential entropy. , 2005, Neural computation.
[74] Robert Tibshirani,et al. The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..
[75] Bernhard Schölkopf,et al. Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.
[76] Andreas Krause,et al. Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.
[77] Hal Daumé,et al. Kernelized Sorting for Natural Language Processing , 2010, AAAI.
[78] Miguel Á. Carreira-Perpiñán,et al. Fast nonparametric clustering with Gaussian blurring mean-shift , 2006, ICML.
[79] Masashi Sugiyama,et al. Sequential change‐point detection based on direct density‐ratio estimation , 2012, Stat. Anal. Data Min..
[80] Ker-Chau Li,et al. On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .
[81] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[82] Motoaki Kawanabe,et al. Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search , 2011, Neural Networks.
[83] Masashi Sugiyama,et al. Change-point detection in time-series data by relative density-ratio estimation , 2012 .
[84] Masashi Sugiyama,et al. Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.
[85] Takafumi Kanamori,et al. Computational complexity of kernel-based density-ratio estimation: a condition number analysis , 2012, Machine Learning.
[86] John Riedl,et al. Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.
[87] Huaiyu Zhu. On Information and Sufficiency , 1997 .
[88] Motoaki Kawanabe,et al. Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.
[89] Shrikanth S. Narayanan,et al. Universal Consistency of Data-Driven Partitions for Divergence Estimation , 2007, 2007 IEEE International Symposium on Information Theory.
[90] Takafumi Kanamori,et al. Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.
[91] A. Kraskov,et al. Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.
[92] Masashi Sugiyama,et al. On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution , 2011, ICML.
[93] Geoffrey E. Hinton,et al. Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.
[94] Alan Edelman,et al. The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..
[95] Masashi Sugiyama,et al. Dependence-Maximization Clustering with Least-Squares Mutual Information , 2011, J. Adv. Comput. Intell. Intell. Informatics.
[96] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.
[97] Takafumi Kanamori,et al. Statistical analysis of kernel-based least-squares density-ratio estimation , 2012, Machine Learning.
[98] Masashi Sugiyama,et al. Canonical dependency analysis based on squared-loss mutual information , 2011, Neural Networks.
[99] Zaïd Harchaoui,et al. DIFFRAC: a discriminative and flexible framework for clustering , 2007, NIPS.
[100] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[101] K. Pearson. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .
[102] Masashi Sugiyama,et al. Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting , 2010, IEICE Trans. Inf. Syst..
[103] Michael I. Jordan,et al. Kernel dimension reduction in regression , 2009, 0908.1854.
[104] Aapo Hyvärinen,et al. Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.
[105] Masashi Sugiyama,et al. Least-Squares Independent Component Analysis , 2011, Neural Computation.
[106] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[107] Fernando Pérez-Cruz,et al. Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.
[108] Christian Jutten,et al. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..
[109] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.
[110] Shay B. Cohen,et al. Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.
[111] Masashi Sugiyama,et al. Feature Selection via L1-Penalized Squared-Loss Mutual Information , 2012, IEICE Trans. Inf. Syst..
[112] Masashi Sugiyama,et al. Suffcient Component Analysis , 2011, ACML.
[113] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[114] Shotaro Akaho,et al. Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.
[115] Masashi Sugiyama,et al. Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise , 2010, AAAI.
[116] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .
[117] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .