论文信息 - Workshop on Model Mining - 字舞流文

Workshop on Model Mining

How to mine the knowledge in the pretrained models is of significance in achieving more promising performance, since practitioners have access to many pretrained models easily. This Workshop on Model Mining aims to investigate more diverse and advanced manners in mining knowledge within models, which tends to leverage the pretrained models more wisely, elegantly and systematically. There are many topics related to this workshop, such as distilling a lightweight model from a well-trained heavy model via teacher-student paradigm, and boosting the performance of the model by carefully designing the predecessor tasks, e.g., pre-training, self-supervised and contrastive learning. Model mining as a special way of data mining is relevant to SIGKDD, and its audience including researchers and engineers will benefit a lot for designing more advanced algorithms for their tasks.

Fei Wang | Changshui Zhang | Chang Xu | Shan You

[1] Dacheng Tao,et al. Learning from Multiple Teacher Networks , 2017, KDD.

[2] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[3] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[4] Changshui Zhang,et al. Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space , 2020, NeurIPS.

[5] Xiangyu Zhang,et al. Angle-based Search Space Shrinking for Neural Architecture Search , 2020, ECCV.

[6] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.

[9] Fei Wang,et al. ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding , 2020, NeurIPS.

[10] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[12] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[13] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[14] Fei Wang,et al. Local Correlation Consistency for Knowledge Distillation , 2020, ECCV.

[15] H. Alzer. On the Cauchy-Schwarz Inequality☆ , 1999 .

[16] J. Schneider,et al. Literature survey on low rank approximation of matrices , 2017 .

[17] Chang Xu,et al. Learning Student Networks with Few Data , 2020, AAAI.

[18] Xiangyu Zhang,et al. Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[19] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.