暂无分享,去创建一个
[1] J. Morgan,et al. Problems in the Analysis of Survey Data, and a Proposal , 1963 .
[2] Yee Whye Teh,et al. LieTransformer: Equivariant self-attention for Lie Groups , 2020, ICML.
[3] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[4] Fabian B. Fuchs,et al. SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks , 2020, NeurIPS.
[5] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[6] G. King,et al. What to Do about Missing Values in Time‐Series Cross‐Section Data , 2010 .
[7] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.
[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[9] Daniel Hernández-Lobato,et al. Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.
[10] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[11] Yair Movshovitz-Attias,et al. No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[12] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[13] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .
[14] Sercan O. Arik,et al. TabNet: Attentive Interpretable Tabular Learning , 2019, AAAI.
[15] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[16] Andrew W. Moore,et al. New Algorithms for Efficient High-Dimensional Nonparametric Classification , 2006, J. Mach. Learn. Res..
[17] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..
[18] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[19] Neil D. Lawrence,et al. Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.
[20] Xiaojin Zhu,et al. Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.
[21] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.
[22] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[23] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.
[24] Yee Whye Teh,et al. Attentive Neural Processes , 2019, ICLR.
[25] Jian Tang,et al. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks , 2018, CIKM.
[26] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .
[27] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[28] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[29] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[30] Stef van Buuren,et al. MICE: Multivariate Imputation by Chained Equations in R , 2011 .
[31] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.
[32] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[33] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[34] Sivaraman Balakrishnan,et al. A Unified View of Label Shift Estimation , 2020, NeurIPS.
[35] R. Zemel,et al. Neural Relational Inference for Interacting Systems , 2018, ICML.
[36] Joan Bruna,et al. Few-Shot Learning with Graph Neural Networks , 2017, ICLR.
[37] Wei-Yin Loh,et al. Fifty Years of Classification and Regression Trees , 2014 .
[38] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[39] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[40] Ralf Krestel,et al. Challenges for Toxic Comment Classification: An In-Depth Error Analysis , 2018, ALW.
[41] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[42] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.
[43] G. King,et al. Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.
[44] Dustin Tran,et al. Image Transformer , 2018, ICML.
[45] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[46] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[47] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[48] N. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .
[49] Tie-Yan Liu,et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.
[50] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.
[51] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[52] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[53] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[54] Yang Hua,et al. Ranked List Loss for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[56] Lihi Zelnik-Manor,et al. ImageNet-21K Pretraining for the Masses , 2021, NeurIPS Datasets and Benchmarks.
[57] Ismail Elezi,et al. Learning Intra-Batch Connections for Deep Metric Learning , 2021, ICML.
[58] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[59] Yee Whye Teh,et al. Conditional Neural Processes , 2018, ICML.
[60] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[61] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[62] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[63] Colin Raffel,et al. Do Transformer Modifications Transfer Across Implementations and Applications? , 2021, EMNLP.
[64] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[65] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[66] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.
[67] Peter Bühlmann,et al. MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..
[68] Percy Liang,et al. A Retrieve-and-Edit Framework for Predicting Structured Outputs , 2018, NeurIPS.
[69] Percy Liang,et al. Generating Sentences by Editing Prototypes , 2017, TACL.
[70] Yee Whye Teh,et al. Set Transformer , 2018, ICML.
[71] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.
[72] Neil D. Lawrence,et al. Deep Gaussian Processes , 2012, AISTATS.
[73] Marta Z. Kwiatkowska,et al. Evaluating Uncertainty Quantification in End-to-End Autonomous Driving Control , 2018, ArXiv.
[74] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[75] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[76] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.
[77] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[78] Anna Veronika Dorogush,et al. CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.
[79] Stefan Schaal,et al. Local dimensionality reduction for locally weighted learning , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.
[80] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.
[81] Yarin Gal,et al. A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks , 2019, ArXiv.
[82] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[83] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[84] John F. Canny,et al. MSA Transformer , 2021, bioRxiv.
[85] Marc Peter Deisenroth,et al. Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.
[86] A. Gelman,et al. Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .
[87] Georg Heigold,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.
[88] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[89] J. Biggs. THE ROLE OF METALEARNING IN STUDY PROCESSES , 1985 .