论文信息 - Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach

Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach

This paper is an attempt to bridge the gap between deep learning and grammatical inference. Indeed, it provides an algorithm to extract a (stochastic) formal language from any recurrent neural network trained for language modelling. In detail, the algorithm uses the already trained network as an oracle -- and thus does not require the access to the inner representation of the black-box -- and applies a spectral approach to infer a weighted automaton. As weighted automata compute linear functions, they are computationally more efficient than neural networks and thus the nature of the approach is the one of knowledge distillation. We detail experiments on 62 data sets (both synthetic and from real-world applications) that allow an in-depth study of the abilities of the proposed algorithm. The results show the WA we extract are good approximations of the RNN, validating the approach. Moreover, we show how the process provides interesting insights toward the behavior of RNN learned on data, enlarging the scope of this work to the one of explainability of deep learning models.

Remi Eyraud | Stephane Ayache | S. Ayache | Rémi Eyraud

[1] Dirk P. Kroese,et al. Cross‐Entropy Method , 2011 .

[2] Shi Feng,et al. Interpreting Neural Networks with Nearest Neighbors , 2018, BlackboxNLP@EMNLP.

[3] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[4] Adelmo Luis Cechin,et al. State automata extraction from recurrent neural nets using k-means and fuzzy clustering , 2003, 23rd International Conference of the Chilean Computer Science Society, 2003. SCCC 2003. Proceedings..

[5] N. Metropolis,et al. The Monte Carlo method. , 1949 .

[6] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[7] Taro Sekiyama,et al. Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces , 2019, AAAI.

[8] Ronald,et al. Learning representations by backpropagating errors , 2004 .

[9] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10] James L. McClelland,et al. Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[11] Raymond L. Watrous,et al. Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[12] Shahrokh Valaee,et al. Recent Advances in Recurrent Neural Networks , 2017, ArXiv.

[13] S C Kleene,et al. Representation of Events in Nerve Nets and Finite Automata , 1951 .

[14] Arindam Chaudhuri. Visual and Text Sentiment Analysis through Hierarchical Deep Learning Networks , 2019, SpringerBriefs in Computer Science.

[15] Ariadna Quattoni,et al. Spectral learning of weighted automata , 2014, Machine Learning.

[16] Mehryar Mohri,et al. Learning Weighted Automata , 2015, CAI.

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .

[20] François Denis,et al. Rational stochastic languages , 2006, ArXiv.

[21] Andreas Maletti,et al. Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[22] Arindam Chaudhuri. Visual and Text Sentiment Analysis , 2019 .

[23] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24] Claude E. Shannon,et al. A mathematical theory of communication , 1948, MOCO.

[25] Liva Ralaivola,et al. Grammatical inference as a principal component analysis problem , 2009, ICML '09.

[26] Mehryar Mohri,et al. Weighted Automata Algorithms , 2009 .

[27] AngluinDana. Learning regular sets from queries and counterexamples , 1987 .

[28] Raphaël Bailly. Méthodes spectrales pour l'inférence grammaticale probabiliste de langages stochastiques rationnels , 2011 .

[29] Amaury Habrard,et al. Using Contextual Representations to Efficiently Learn Context-Free Languages , 2010, J. Mach. Learn. Res..

[30] Jack W. Carlyle,et al. Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..

[31] Panagiotis Manolios,et al. First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[32] C. Lee Giles,et al. Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[33] Zachary Chase Lipton. A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[34] Yoichi Hayashi,et al. Automated Extraction of Fuzzy IF-THEN Rules Using Neural Networks , 1990 .

[35] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[36] Noah A. Smith,et al. A Formal Hierarchy of RNN Architectures , 2020, ACL.

[37] Hava T. Siegelmann,et al. Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[38] William Merrill,et al. Sequential Neural Networks as Automata , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[39] Petr Motlícek,et al. Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition , 2012, INTERSPEECH.

[40] Rémi Eyraud,et al. Scikit-SpLearn : a toolbox for the spectral learning of weighted automata compatible with scikit-learn , 2017 .

[41] Eran Yahav,et al. On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[42] J. Sakarovitch. Rational and Recognisable Power Series , 2009 .

[43] John McGee,et al. Discretization of Time Series Data , 2005, J. Comput. Biol..

[44] R. Harmon,et al. Sustainable IT services: Assessing the impact of green computing practices , 2009, Portland International Conference on Management of Engineering and Technology.

[45] Xue Liu,et al. Verification of Recurrent Neural Networks Through Rule Extraction , 2018, ArXiv.

[46] Chihiro Shibata,et al. Predicting Sequential Data with LSTMs Augmented with Strictly 2-Piecewise Input Vectors , 2016, ICGI.

[47] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[48] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[49] Claude E. Shannon,et al. The mathematical theory of communication , 1950 .

[50] Bart van Merrienboer,et al. Automatic differentiation in ML: Where we are and where we should be going , 2018, NeurIPS.