Multi-Element Long Distance Dependencies: Using SPk Languages to Explore the Characteristics of Long-Distance Dependencies

In order to successfully model Long Distance Dependencies (LDDs) it is necessary to understand the full-range of the characteristics of the LDDs exhibited in a target dataset. In this paper, we use Strictly k-Piecewise languages to generate datasets with various properties. We then compute the characteristics of the LDDs in these datasets using mutual information and analyze the impact of factors such as (i) k, (ii) length of LDDs, (iii) vocabulary size, (iv) forbidden subsequences, and (v) dataset size. This analysis reveal that the number of interacting elements in a dependency is an important characteristic of LDDs. This leads us to the challenge of modelling multi-element long-distance dependencies. Our results suggest that attention mechanisms in neural networks may aide in modeling datasets with multi-element long-distance dependencies. However, we conclude that there is a need to develop more efficient attention mechanisms to address this issue.

[1]  James Rogers,et al.  On Languages Piecewise Testable in the Strict Sense , 2007, MOL.

[2]  Angela D. Friederici,et al.  Artificial grammar learning meets formal language theory: an overview , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[4]  Chihiro Shibata,et al.  Subregular Complexity and Deep Learning , 2017, ArXiv.

[5]  A. W. Smith,et al.  Encoding sequential structure: experience with the real-time recurrent learning algorithm , 1989, International 1989 Joint Conference on Neural Networks.

[6]  A. Reber Implicit learning of artificial grammars , 1967 .

[7]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[8]  John D. Kelleher,et al.  Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets , 2018, ArXiv.

[9]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[10]  James Rogers,et al.  Estimating Strictly Piecewise Distributions , 2010, ACL.

[11]  James Rogers,et al.  Formal language theory: refining the Chomsky hierarchy , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12]  John D. Kelleher,et al.  Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures , 2018, ICANN.

[13]  Raymond L. Watrous,et al.  Induction of Finite-State Automata Using Second-Order Recurrent Networks , 1991, NIPS.

[14]  W. Ebeling,et al.  Entropy and Long-Range Correlations in Literary English , 1993, cond-mat/0204108.

[15]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[16]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[17]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[18]  P. Grassberger Entropy Estimates from Insufficient Samplings , 2003, physics/0307138.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Max Tegmark,et al.  Critical Behavior in Physics and Probabilistic Formal Languages , 2016, Entropy.

[22]  Mike Casey,et al.  The Dynamics of Discrete-Time Computation, with Application to Recurrent Neural Networks and Finite State Machine Extraction , 1996, Neural Computation.

[23]  Yiming Yang,et al.  Transformer-XL: Language Modeling with Longer-Term Dependency , 2018 .

[24]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[25]  M. Tomita Learning of construction of finite automata from examples using hill-climbing : RR: Regular set Recognizer , 1982 .

[26]  Jie Fu,et al.  An Algebraic Characterization of Strictly Piecewise Languages , 2011, TAMC.

[27]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..