Sharing mass spectrometry data in a grid-based distributed proteomics laboratory

Data produced by mass spectrometry (MS) have been using in proteomics experiments to identify proteins or patterns in clinical samples that may be responsible for human diseases. MS-based proteomics is becoming a powerful, widely used technique to identify different molecular targets in different pathological contexts. Moreover, MS samples contain a huge amount of data; retrieving such information requires accessing to large volumes of data, thus an efficient organization is necessary both to reduce access time and to allow efficient knowledge extraction. Bioinformatics laboratories have been using more than one mass spectrometer to improve efficiency, largely increasing the volume of data obtained by experiments. Moreover, experimental data is enriched by observations and descriptions added by specialists through metadata. Thus, information retrieval of spectra data (and metadata describing them) inside a laboratory and among different laboratories requires large and scalable storage solutions, and high performance computational platforms. We present a software system for managing, sharing, and querying MS data in a distributed laboratory, using a spectra data management system, called SpecDB, where information retrieval is performed by using computational grid facilities. Information retrieval can be conducted either locally, by considering portions of spectra data, or in a distributed scenario, exploiting metadata and annotations about spectra datasets stored on the grid.