Language for the definition and exchange of biological data sets

Increasing numbers of biologists and institutes are becoming involved in taxonomic database projects that for good practical and historical reasons use different hardware, software and data structures. A large number of diverse application programs, mostly with incompatible data formats, is in use by biologists for different purposes. Translating data from one program, project or format to another and developing special-purpose or 'one-off' translation software is becoming a rapidly growing burden on the biological community. A database file exchange medium to handle diverse classes of biological data is required for file transfers between different database projects. 'XDF' (the Exchange Data Format) is such a medium. Data sets prepared in XDF consists of text files that are effectively independent of any particular project. XDF is a high-level language for describing biological data, with its own syntax and command vocabulary, analogous to the high-level programming languages used to describe software algorithms. XDF files may be generated and read automatically by programs to transfer large amounts of data between sophisticated databases. Alternatively, biologists unfamiliar with the terminology, data rules and syntax of the data format required for a particular application program or database can use a text editor to create an XDF data file. We hope the existence of XDF will encourage the development of more sophisticated general-purpose programs for interactive biological data entry. XDF is being used initially with numeric and structured textual descriptive data, but is designed to be extensible to other classes of biological data such as images. Provision is made within XDF for predefined standard definitions of the common core elements of biological data sets such as the taxonomic hierarchy, biological nomenclature, descriptive material and bibliography. Using these, XDF can be used to define specialist transfer formats for particular application areas while providing strict control of the data types and data definitions used.