论文信息 - Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs - 字舞流文

Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs

Training data attribution (TDA) methods offer to trace a model's prediction on any given example back to specific influential training examples. Existing approaches do so by assigning a scalar influence score to each training example, under a simplifying assumption that influence is additive. But in reality, we observe that training examples interact in highly non-additive ways due to factors such as inter-example redundancy, training order, and curriculum learning effects. To study such interactions, we propose Simfluence, a new paradigm for TDA where the goal is not to produce a single influence score per example, but instead a training run simulator: the user asks, ``If my model had trained on example $z_1$, then $z_2$, ..., then $z_n$, how would it behave on $z_{test}$?''; the simulator should then output a simulated training run, which is a time series predicting the loss on $z_{test}$ at every step of the simulated run. This enables users to answer counterfactual questions about what their model would have learned under different training curricula, and to directly see where in training that learning would occur. We present a simulator, Simfluence-Linear, that captures non-additive interactions and is often able to predict the spiky trajectory of individual example losses with surprising fidelity. Furthermore, we show that existing TDA methods such as TracIn and influence functions can be viewed as special cases of Simfluence-Linear. This enables us to directly compare methods in terms of their simulation accuracy, subsuming several prior TDA approaches to evaluation. In experiments on large language model (LLM) fine-tuning, we show that our method predicts loss trajectories with much higher accuracy than existing TDA methods (doubling Spearman's correlation and reducing mean-squared error by 75%) across several tasks, models, and training methods.

Ellie Pavlick | Tolga Bolukbasi | Kelvin Guu | Ian Tenney | Lucas Dixon | Albert Webson | Elizabeth-Jane Pavlick

[1] Byron C. Wallace,et al. How Many and Which Training Points Would Need to be Removed to Flip this Prediction? , 2023, EACL.

[2] Yulia Tsvetkov,et al. ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data , 2022, ArXiv.

[3] Colin Raffel,et al. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , 2022, NeurIPS.

[4] David Vilar,et al. Scaling Up Influence Functions , 2021, AAAI.

[5] Anders Sogaard,et al. Revisiting Methods for Finding Influential Examples , 2021, ArXiv.

[6] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[7] Xiaochuang Han,et al. Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates , 2021, EMNLP.

[8] Byron C. Wallace,et al. Combining Feature and Instance Attribution to Detect Artifacts , 2021, FINDINGS.

[9] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[10] Yudong Chen,et al. A Survey on Curriculum Learning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Caiming Xiong,et al. FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging , 2020, EMNLP.

[12] Yejin Choi,et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.

[13] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[14] Yulia Tsvetkov,et al. Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions , 2020, ACL.

[15] Frederick Liu,et al. Estimating Training Data Influence by Tracking Gradient Descent , 2020, NeurIPS.

[16] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.

[17] Ronan Le Bras,et al. WinoGrande , 2019, AAAI.

[18] Percy Liang,et al. On the Accuracy of Influence Functions for Measuring Group Effects , 2019, NeurIPS.

[19] Pradeep Ravikumar,et al. Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[20] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[21] Jonghyun Choi,et al. ScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks , 2018 .

[22] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[23] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[24] Tao Qin,et al. Learning What Data to Learn , 2017, ArXiv.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] Zornitsa Kozareva,et al. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[27] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[29] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[30] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[31] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[32] L. Shapley. A Value for n-person Games , 1988 .

[33] S. Weisberg,et al. Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression , 1980 .

[34] F. Hampel. The Influence Curve and Its Role in Robust Estimation , 1974 .

[35] 藤重悟. Submodular functions and optimization , 1991 .