Single-molecule dataset (SMD): a generalized storage format for raw and processed single-molecule data

BackgroundSingle-molecule techniques have emerged as incisive approaches for addressing a wide range of questions arising in contemporary biological research [Trends Biochem Sci 38:30–37, 2013; Nat Rev Genet 14:9–22, 2013; Curr Opin Struct Biol 2014, 28C:112–121; Annu Rev Biophys 43:19–39, 2014]. The analysis and interpretation of raw single-molecule data benefits greatly from the ongoing development of sophisticated statistical analysis tools that enable accurate inference at the low signal-to-noise ratios frequently associated with these measurements. While a number of groups have released analysis toolkits as open source software [J Phys Chem B 114:5386–5403, 2010; Biophys J 79:1915–1927, 2000; Biophys J 91:1941–1951, 2006; Biophys J 79:1928–1944, 2000; Biophys J 86:4015–4029, 2004; Biophys J 97:3196–3205, 2009; PLoS One 7:e30024, 2012; BMC Bioinformatics 288 11(8):S2, 2010; Biophys J 106:1327–1337, 2014; Proc Int Conf Mach Learn 28:361–369, 2013], it remains difficult to compare analysis for experiments performed in different labs due to a lack of standardization.ResultsHere we propose a standardized single-molecule dataset (SMD) file format. SMD is designed to accommodate a wide variety of computer programming languages, single-molecule techniques, and analysis strategies. To facilitate adoption of this format we have made two existing data analysis packages that are used for single-molecule analysis compatible with this format.ConclusionAdoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit both sophisticated users and non-specialists by allowing standardized, transparent, and reproducible analysis practices.

[1]  G. H. Hamm,et al.  The EMBL data library , 1993, Nucleic Acids Res..

[2]  Chris H Wiggins,et al.  Empirical Bayes methods enable advanced population-level analyses of single-molecule FRET experiments. , 2014, Biophysical journal.

[3]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[4]  John P. A. Ioannidis,et al.  Research: increasing value, reducing waste 2 , 2014 .

[5]  R. Tibshirani,et al.  Increasing value and reducing waste in research design, conduct, and analysis , 2014, The Lancet.

[6]  C. Ball,et al.  Expanding yeast knowledge online , 1998, Yeast.

[7]  H. Berman The Protein Data Bank: a historical perspective. , 2008, Acta crystallographica. Section A, Foundations of crystallography.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  S. McKinney,et al.  Analysis of single-molecule FRET trajectories using hidden Markov modeling. , 2006, Biophysical journal.

[10]  Chris Wiggins,et al.  Graphical models for inferring single molecule dynamics , 2010, BMC Bioinformatics.

[11]  F Sachs,et al.  A direct optimization approach to hidden Markov modeling for single channel kinetics. , 2000, Biophysical journal.

[12]  R J Read,et al.  Crystallography & NMR system: A new software suite for macromolecular structure determination. , 1998, Acta crystallographica. Section D, Biological crystallography.

[13]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[14]  Frank D. Wood,et al.  Hierarchically-coupled hidden Markov models for learning kinetic rates from single-molecule data , 2013, ICML.

[15]  Hideaki Sugawara,et al.  DNA Data Bank of Japan (DDBJ) for genome scale research in life science , 2002, Nucleic Acids Res..

[16]  C Burks,et al.  The GenBank genetic sequence data bank. , 1988, Nucleic acids research.

[17]  Chris H Wiggins,et al.  Learning rates and states from biophysical time series: a Bayesian approach to model selection and single-molecule FRET data. , 2009, Biophysical journal.

[18]  F Sachs,et al.  Hidden Markov modeling for single channel kinetics with filtering and correlated noise. , 2000, Biophysical journal.

[19]  C. Joo,et al.  Bringing single-molecule spectroscopy to macromolecular protein complexes. , 2013, Trends in biochemical sciences.

[20]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[21]  Steven M Block,et al.  Reconstructing folding energy landscapes by single-molecule force spectroscopy. , 2014, Annual review of biophysics.

[22]  Carla Coltharp,et al.  Quantitative analysis of single-molecule superresolution images. , 2014, Current opinion in structural biology.

[23]  R. Warnke,et al.  Immune signatures in follicular lymphoma. , 2005, The New England journal of medicine.

[24]  K. Dahmen,et al.  A comparative study of multivariate and univariate hidden Markov modelings in time-binned single-molecule FRET data analysis. , 2010, The journal of physical chemistry. B.

[25]  James W. Fickett,et al.  The GenBank genetic sequence databank , 1986, Nucleic Acids Res..

[26]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[27]  Haw Yang,et al.  Information bounds and optimal analysis of dynamic single molecule measurements. , 2004, Biophysical journal.

[28]  Nynke H. Dekker,et al.  Studying genomic processes at the single-molecule level: introducing the tools and applications , 2012, Nature Reviews Genetics.

[29]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[30]  John P. A. Ioannidis,et al.  How to Make More Published Research True , 2014, PLoS medicine.

[31]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[32]  Dmitri S. Pavlichin,et al.  Single Molecule Analysis Research Tool (SMART): An Integrated Approach for Analyzing Single Molecule Data , 2012, PloS one.