Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach

This paper is an attempt to bridge the gap between deep learning and grammatical inference. Indeed, it provides an algorithm to extract a (stochastic) formal language from any recurrent neural network trained for language modelling. In detail, the algorithm uses the already trained network as an oracle -- and thus does not require the access to the inner representation of the black-box -- and applies a spectral approach to infer a weighted automaton. As weighted automata compute linear functions, they are computationally more efficient than neural networks and thus the nature of the approach is the one of knowledge distillation. We detail experiments on 62 data sets (both synthetic and from real-world applications) that allow an in-depth study of the abilities of the proposed algorithm. The results show the WA we extract are good approximations of the RNN, validating the approach. Moreover, we show how the process provides interesting insights toward the behavior of RNN learned on data, enlarging the scope of this work to the one of explainability of deep learning models.

[1]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[2]  Shi Feng,et al.  Interpreting Neural Networks with Nearest Neighbors , 2018, BlackboxNLP@EMNLP.

[3]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[4]  Adelmo Luis Cechin,et al.  State automata extraction from recurrent neural nets using k-means and fuzzy clustering , 2003, 23rd International Conference of the Chilean Computer Science Society, 2003. SCCC 2003. Proceedings..

[5]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[6]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[7]  Taro Sekiyama,et al.  Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces , 2019, AAAI.

[8]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[11]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[12]  Shahrokh Valaee,et al.  Recent Advances in Recurrent Neural Networks , 2017, ArXiv.

[13]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[14]  Arindam Chaudhuri Visual and Text Sentiment Analysis through Hierarchical Deep Learning Networks , 2019, SpringerBriefs in Computer Science.

[15]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[16]  Mehryar Mohri,et al.  Learning Weighted Automata , 2015, CAI.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[20]  François Denis,et al.  Rational stochastic languages , 2006, ArXiv.

[21]  Andreas Maletti,et al.  Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[22]  Arindam Chaudhuri Visual and Text Sentiment Analysis , 2019 .

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[25]  Liva Ralaivola,et al.  Grammatical inference as a principal component analysis problem , 2009, ICML '09.

[26]  Mehryar Mohri,et al.  Weighted Automata Algorithms , 2009 .

[27]  AngluinDana Learning regular sets from queries and counterexamples , 1987 .

[28]  Raphaël Bailly Méthodes spectrales pour l'inférence grammaticale probabiliste de langages stochastiques rationnels , 2011 .

[29]  Amaury Habrard,et al.  Using Contextual Representations to Efficiently Learn Context-Free Languages , 2010, J. Mach. Learn. Res..

[30]  Jack W. Carlyle,et al.  Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..

[31]  Panagiotis Manolios,et al.  First-Order Recurrent Neural Networks and Deterministic Finite State Automata , 1994, Neural Computation.

[32]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[33]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[34]  Yoichi Hayashi,et al.  Automated Extraction of Fuzzy IF-THEN Rules Using Neural Networks , 1990 .

[35]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[36]  Noah A. Smith,et al.  A Formal Hierarchy of RNN Architectures , 2020, ACL.

[37]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[38]  William Merrill,et al.  Sequential Neural Networks as Automata , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[39]  Petr Motlícek,et al.  Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition , 2012, INTERSPEECH.

[40]  Rémi Eyraud,et al.  Scikit-SpLearn : a toolbox for the spectral learning of weighted automata compatible with scikit-learn , 2017 .

[41]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[42]  J. Sakarovitch Rational and Recognisable Power Series , 2009 .

[43]  John McGee,et al.  Discretization of Time Series Data , 2005, J. Comput. Biol..

[44]  R. Harmon,et al.  Sustainable IT services: Assessing the impact of green computing practices , 2009, Portland International Conference on Management of Engineering and Technology.

[45]  Xue Liu,et al.  Verification of Recurrent Neural Networks Through Rule Extraction , 2018, ArXiv.

[46]  Chihiro Shibata,et al.  Predicting Sequential Data with LSTMs Augmented with Strictly 2-Piecewise Input Vectors , 2016, ICGI.

[47]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[48]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[49]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[50]  Bart van Merrienboer,et al.  Automatic differentiation in ML: Where we are and where we should be going , 2018, NeurIPS.

[51]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[52]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[53]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[54]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[55]  C. Lee Giles,et al.  Connecting First and Second Order Recurrent Networks with Deterministic Finite Automata , 2019, ArXiv.

[56]  Stéphane Ayache,et al.  Explaining Black Boxes on Sequential Data using Weighted Automata , 2018, ICGI.

[57]  Jeffrey M. Voas,et al.  “Alexa, Can I Trust You?” , 2017, Computer.

[58]  Ariadna Quattoni,et al.  Results of the Sequence PredIction ChallengE (SPiCe): a Competition on Learning the Next Symbol in a Sequence , 2016, ICGI.

[59]  Michael C. Mozer,et al.  State-Denoised Recurrent Neural Networks , 2018, ArXiv.

[60]  Alexander Clark,et al.  Polynomial Identification in the Limit of Substitutable Context-free Languages , 2005 .

[61]  Wen-Guey Tzeng,et al.  A Polynomial-Time Algorithm for the Equivalence of Probabilistic Automata , 1992, SIAM J. Comput..

[62]  A. Reber Implicit learning of artificial grammars , 1967 .

[63]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[64]  Sanjeev Khudanpur,et al.  Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[65]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[66]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[67]  C. Lee Giles,et al.  An Empirical Evaluation of Recurrent Neural Network Rule Extraction , 2017, ArXiv.

[68]  Jean Berstel,et al.  Rational series and their languages , 1988, EATCS monographs on theoretical computer science.

[69]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[70]  Chris Aldrich,et al.  ANN-DT: an algorithm for extraction of decision trees from artificial neural networks , 1999, IEEE Trans. Neural Networks.

[71]  Colin de la Higuera,et al.  Distance and Equivalence between Finite State Machines and Recurrent Neural Networks: Computational results , 2020, ArXiv.

[72]  M. Droste,et al.  Handbook of Weighted Automata , 2009 .

[73]  Joelle Pineau,et al.  Multitask Spectral Learning of Weighted Automata , 2017, NIPS.

[74]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[75]  John D. Kelleher,et al.  Multi-Element Long Distance Dependencies: Using SPk Languages to Explore the Characteristics of Long-Distance Dependencies , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[76]  Qin Lin,et al.  Interpreting Finite Automata for Sequential Data , 2016, NIPS 2016.

[77]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[78]  Doina Precup,et al.  Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning , 2018, AISTATS.

[79]  Ryo Yoshinaka,et al.  Distributional Learning of Context-Free and Multiple Context-Free Grammars , 2016 .

[80]  Eran Yahav,et al.  Learning Deterministic Weighted Automata with Queries and Counterexamples , 2019, NeurIPS.

[81]  Hava T. Siegelmann,et al.  On the computational power of neural nets , 1992, COLT '92.

[82]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[83]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[84]  Colin de la Higuera,et al.  PAutomaC: a probabilistic automata and hidden Markov models learning competition , 2013, Machine Learning.

[85]  Chihiro Shibata,et al.  Subregular Complexity and Deep Learning , 2017, ArXiv.

[86]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[87]  Ariadna Quattoni,et al.  Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion , 2013, EMNLP.

[88]  Ariadna Quattoni,et al.  A Maximum Matching Algorithm for Basis Selection in Spectral Learning , 2017, AISTATS.

[89]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[90]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[91]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[92]  Ryo Yoshinaka,et al.  Distributional learning of parallel multiple context-free grammars , 2013, Machine Learning.