论文信息 - Taming the Length Field in Binary Data: Calc-Regular Languages

Taming the Length Field in Binary Data: Calc-Regular Languages

When binary data are sent over a byte stream, the binary format sender and receiver are using is a "data serialization language", either explicitely specified, or implied by the implementations. Security is at risk when sender and receiver disagree on details of this language. If, e.g., the receiver fails to reject invalid messages, an adversary may assemble such invalid messages to compromise the receiver's security. Many data serialization languages are length-prefix languages. When sending/storing some F of flexible size, F is encoded at the binary level as a pair (|F|, F), with |F| representing the length of F (typically in bytes). This paper's main contributions and results are as follows. (1) Length-prefix langages are not context-free. This might seem to justify the conjecture that parsing those languages is difficult and not efficient. (2) The class of "calc-regular languages" is proposed, a minimalistic extension of regular languages with the additional property of handling length-fields. Calc-regular languages can be specified via "calc-regular expressions", a natural extension of regular expressions. (3) Calc-regular languages are almost as easy to parse as regular languages, using finite-state machines with additional accumulators. This disproves the conjecture from (1).

Stefan Lucks | Norina Marie Grosch | Joshua Konig | S. Lucks | Joshua Konig

[1] Sergey Bratus,et al. The Halting Problems of Network Stack Insecurity , 2011, login Usenix Mag..

[2] Friedrich L. Bauer,et al. Report on the algorithmic language ALGOL 60 , 1960, Commun. ACM.

[3] Sergey Bratus,et al. Security Applications of Formal Language Theory , 2013, IEEE Systems Journal.

[4] Anthony G. Oettinger,et al. Automatic syntactic analysis and the pushdown store , 1961 .

[5] Brendan P. Kehoe. Zen and the Art of the Internet , 1993 .

[6] Информатика. Portable Network Graphics , 2010 .

[7] Alaa A. Kharbouch,et al. Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[8] Sergey Bratus,et al. The Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to Expunge Them , 2016, 2016 IEEE Cybersecurity Development (SecDev).

[9] Jean Berstel,et al. Context-Free Languages and Pushdown Automata , 1997, Handbook of Formal Languages.

[10] Len Sassaman,et al. PKI Layer Cake: New Collision Attacks against the Global X.509 Infrastructure , 2010, Financial Cryptography.

[11] Sergey Bratus,et al. The Bugs We Have to Kill , 2015, login Usenix Mag..

[12] Dana S. Scott,et al. Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..

[13] Burton S. Kaliski. A Layman's Guide to a Subset of ASN.1, BER, and DER , 2002 .