Chipster: user-friendly analysis software for microarray and other high-throughput data

BackgroundThe growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software.ResultsChipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies.ConclusionsChipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.

[1]  M. van Engeland,et al.  VHL and HIF signalling in renal cell carcinogenesis , 2010, The Journal of pathology.

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  D. Koller,et al.  Population genomics of human gene expression , 2007, Nature Genetics.

[4]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[5]  Warren A Kibbe,et al.  nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays , 2007, Biology Direct.

[6]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[7]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[8]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[9]  Ralf Herwig,et al.  ConsensusPathDB: toward a more complete picture of cell biology , 2010, Nucleic Acids Res..

[10]  N. Gerry,et al.  Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data , 2003, BMC Cancer.

[11]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[12]  S. Knuutila,et al.  Array CGH in molecular diagnosis of mental retardation—A study of 150 Finnish patients , 2010, American journal of medical genetics. Part A.

[13]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[14]  W. Liang,et al.  TM4 microarray software suite. , 2006, Methods in enzymology.

[15]  Steen Knudsen,et al.  Alternative mapping of probes to genes for Affymetrix chips , 2004, BMC Bioinformatics.

[16]  Sunduz Keles,et al.  Statistical Applications in Genetics and Molecular Biology Supervised Detection of Conserved Motifs in DNA Sequences with Cosmo , 2011 .

[17]  Dennis B. Troup,et al.  NCBI GEO: archive for functional genomics data sets—10 years on , 2010, Nucleic Acids Res..

[18]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[19]  Ibrahim Emam,et al.  ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments , 2010, Nucleic Acids Res..

[20]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[22]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[23]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[24]  M. A. van de Wiel,et al.  Weighted clustering of called array CGH data. , 2008, Biostatistics.

[25]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[26]  Philippe Dessen,et al.  Molecular Characterization of Breast Cancer with High-Resolution Oligonucleotide Comparative Genomic Hybridization Array , 2009, Clinical Cancer Research.

[27]  Tero Aittokallio,et al.  Optimized detection of differential expression in global profiling experiments: case studies in clinical transcriptomic and quantitative proteomic datasets , 2009, Briefings Bioinform..

[28]  M. A. van de Wiel,et al.  CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss , 2007, Cancer informatics.

[29]  G. Pavesi,et al.  Using Weeder for the Discovery of Conserved Transcription Factor Binding Sites , 2006, Current protocols in bioinformatics.

[30]  Wessel N. van Wieringen,et al.  CGHcall: calling aberrations for array CGH tumor profiles , 2007, Bioinform..

[31]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[32]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[33]  Christopher H Contag,et al.  TOB1 is regulated by EGF-dependent HER2 and EGFR signaling, is highly phosphorylated, and indicates poor prognosis in node-negative breast cancer. , 2009, Cancer research.

[34]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[35]  Matthew E. Ritchie,et al.  A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data , 2009, Nucleic acids research.

[36]  Marieke E. Timmerman,et al.  Smoothing waves in array CGH tumor profiles , 2009, Bioinform..

[37]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[38]  Samuel Myllykangas,et al.  CanGEM: mining gene copy number changes in cancer , 2007, Nucleic Acids Res..

[39]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[40]  J. Ross,et al.  Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[41]  W. Liang,et al.  9) TM4 Microarray Software Suite , 2006 .

[42]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[43]  Kay Nieselt,et al.  Mayday - integrative analytics for expression data , 2010, BMC Bioinformatics.

[44]  Ulf Leser,et al.  Tools for managing and analyzing microarray data , 2012, Briefings Bioinform..

[45]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[46]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[47]  Ruud H. Brakenhoff,et al.  CGHMultiArray: exact P-values for multi-array comparative genomic hybridization data , 2005, Bioinform..

[48]  Wessel N van Wieringen,et al.  Nonparametric Testing for DNA Copy Number Induced Differential mRNA Gene Expression , 2009, Biometrics.