DataJoint: A Simpler Relational Data Model

The relational data model offers unrivaled rigor and precision in defining data structure and querying complex data. Yet the use of relational databases in scientific data pipelines is limited due to their perceived unwieldiness. We propose a simplified and conceptually refined relational data model named DataJoint. The model includes a language for schema definition, a language for data queries, and diagramming notation for visualizing entities and relationships among them. The model adheres to the principle of entity normalization, which requires that all data -- both stored and derived -- must be represented by well-formed entity sets. DataJoint's data query language is an algebra on entity sets with five operators that provide matching capabilities to those of other relational query languages with greater clarity due to entity normalization. Practical implementations of DataJoint have been adopted in neuroscience labs for fluent interaction with scientific data pipelines.