ProtoML: A rule-based validation language for Google Protocol Buffers

We live in an information age where massive amounts of data are being transferred across multiple platforms. Cross-platform formats are constantly evolving to aid the communication between machines running different backends. There are text-based (e.g.: XML, JSON) and binary (e.g.: BSON, bXML, Google Protocol Buffers) formats that can provide the basis of common communication. In this article we will be focusing on a widely used binary-format: Google Protocol Buffers. This format has support for various languages. One caveat of this format is that although the structure of the messages is defined by the “.proto” file the validation and sanitization of the transmitted messages have to be implemented by the developer. We introduce a new metalanguage called ProtoML that serves as a solid foundation for the definition of validation and constraint rules for Protocol Buffer messages. The ProtoML rules can easily be extended using additional validation functions and have the added benefit of providing a way to correct messages on the fly prior to transmission. We will also demonstrate a draft implementation of the ProtoML language to showcase its potential.