LINUCS: linear notation for unique description of carbohydrate sequences.

The use of proteomics databases has become indispensable for daily work of molecular biologists, but this situation has not yet been achieved for carbohydrate applications. One obvious reason is that existing data collections are only rarely annotated and no cross-linking to other resources exists. The existence of a generally accepted linear, canonical description for carbohydrates which can be readily processed by computers will enable efficient automatic cross-linking of distributed carbohydrate data collections by serving as a unique and unambiguous database access key. Various possibilities to derive a canonical notation are discussed. They can be divided into attempts that require structure description alone and alternatives that profit from the fact that a preferred graph direction (non-reducing to reducing end) exists within the structure. To open a fruitful discussion among glycoscientists a possible solution is presented where the reducing monosaccharide unit is selected as graph root and linkage information is used to define the priority of the various branches. A Web interface (http://www.dkfz.de/spec/linucs/) has been created that directly converts the commonly used extended representation of complex carbohydrates into the preferred canonical description or into its inverted form.