Computational proteomics: management and analysis of proteomics data

Proteomics is about the study of the proteins expressed in a cell, organism, or tissue. This includes protein identification and quantification (or quantitation), protein–protein interactions, protein complexes prediction, protein modifications and protein localization in the cell. Mass Spectrometry (MS) is one of the main technologies in proteomics and is more and more used for its increasing precision and for the possibility to automate the proteomics analysis pipeline, yielding to large-scale highthroughput experiments. Since proteins play a central role in the life of an organism, proteomics is instrumental in many biomedical applications, such as biomarker discovery and drug treatment evaluation, as well as for investigating the dynamics of cells in Systems Biology. Computational Proteomics is about the computational methods, algorithms, databases and methodologies used to process, manage, analyze and interpret the data produced in proteomics experiments. The broad application of proteomics in different biological and medical fields, as well as the diffusion of high-throughput platforms, leads to increasing volumes of available proteomics data requiring efficient algorithms, new data management capabilities and novel analysis, inference and visualization techniques. Moreover, high-throughput production and collection of data pose new challenges in data handling and reusability as well as in tools interoperability and interconnection. On the other hand, the increasing availability of data and tools opens new research directions and opportunities (e.g. annotated spectral libraries) that can be exploited only through the rigorous application of computer science, machine learning, knowledge discovery, statistics and signal processing techniques. As in many scientific disciplines, the final goal of Computational Proteomics is to infer knowledge models (e.g. verify a hypothesis or identify proteins involved in a disease) from the inspection of biological samples. Such an activity involves different steps, happening both in wet and dry lab, and is the result of combination of many instruments, methods, tools, algorithms, databases, according to established or emerging working methodologies and standards. This special issue explores the current state-ofthe-art research taking place in different areas of Computational Proteomics, with special emphasis on methodologies and tools for data handling and analysis, machine learning, knowledge discovery, biomarker discovery, data standardization and information quality, in MS-based proteomics. The discussion of all experimental techniques and computational methods taking place in proteomics, would be too large to be hosted in one journal issue, thus, this special issue is complemented by the March 2008 issue of Briefings in Functional Genomics and Proteomics, that discusses, among others, MS-based techniques for improving the study of protein structure and protein folding, as well as some applications and tools in Interactomics and Systems Biology. MS permits, with high accuracy, the determination of molecular weight of chemical compounds, ranging from small molecules to large, polar biopolymers [1]. The mass spectrometer separates gas phase ions according to their mass to charge ratio values. The output of the spectrometer (the spectrum) is a large sequence of value pairs. Each pair contains a measured intensity and a mass to charge ratio (m/z), which depend, respectively, on the quantity and molecular mass of the detected molecule. Although MS is only able to identify molecular masses, the adoption of advanced sample preparation techniques, such as chromatographic separation or labeling techniques, the use of protein database information, e.g. the theoretical spectrum associated to a protein sequence, and the application of powerful machine learning algorithms, make this technique a basic cornerstone for proteomics applications. BRIEFINGS IN BIOINFORMATICS. VOL 9. NO 2. 97^101 doi:10.1093/bib/bbn011 Advance Access publication February 29, 2008