Micro-Analyzer: Automatic preprocessing of Affymetrix microarray data

A current trend in genomics is the investigation of the cell mechanism using different technologies, in order to explain the relationship among genes, molecular processes and diseases. For instance, the combined use of gene-expression arrays and genomic arrays has been demonstrated as an effective instrument in clinical practice. Consequently, in a single experiment different kind of microarrays may be used, resulting in the production of different types of binary data (images and textual raw data). The analysis of microarray data requires an initial preprocessing phase, that makes raw data suitable for use on existing analysis platforms, such as the TIGR M4 (TM4) Suite. An additional challenge to be faced by emerging data analysis platforms is the ability to treat in a combined way those different microarray formats coupled with clinical data. In fact, resulting integrated data may include both numerical and symbolic data (e.g. gene expression and SNPs regarding molecular data), as well as temporal data (e.g. the response to a drug, time to progression and survival rate), regarding clinical data. Raw data preprocessing is a crucial step in analysis but is often performed in a manual and error prone way using different software tools. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of different microarray data are needed. The paper presents Micro-Analyzer (Microarray Analyzer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix gene expression and SNP binary data. It represents the evolution of the μ-CS tool, extending the preprocessing to SNP arrays that were not allowed in μ-CS. The Micro-Analyzer is provided as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data (gene expression and SNPs) by invoking TM4 platform. It avoids: (i) the manual invocation of external tools (e.g. the Affymetrix Power Tools), (ii) the manual loading of preprocessing libraries, and (iii) the management of intermediate files, such as results and metadata. Micro-Analyzer users can directly manage Affymetrix binary data without worrying about locating and invoking the proper preprocessing tools and chip-specific libraries. Moreover, users of the Micro-Analyzer tool can load the preprocessed data directly into the well-known TM4 platform, extending in such a way also the TM4 capabilities. Consequently, Micro Analyzer offers the following advantages: (i) it reduces possible errors in the preprocessing and further analysis phases, e.g. due to the incorrect choice of parameters or due to the use of old libraries, (ii) it enables the combined and centralized pre-processing of different arrays, (iii) it may enhance the quality of further analysis by storing the workflow, i.e. information about the preprocessing steps, and (iv) finally Micro-Analzyer is freely available as a standalone application at the project web site http://sourceforge.net/projects/microanalyzer/.

[1]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[2]  P. Sham,et al.  OpenADAM: an open source genome-wide association data management system for Affymetrix SNP arrays , 2008, BMC Genomics.

[3]  Mario Cannataro,et al.  Micro-Analyzer: a tool for automatic pre-processing of multiple Affymetrix arrays , 2012 .

[4]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[5]  Marimuthu Palaniswami,et al.  Machine learning in low-level microarray analysis , 2003, SKDD.

[6]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[7]  Mario Cannataro,et al.  An Extension of the TIGR M4 Suite to Preprocess and Visualize Affymetrix Binary Files , 2008, CIBB.

[8]  A. Daly,et al.  Genome-wide association studies in pharmacogenomics , 2010, Nature Reviews Genetics.

[9]  Victor Maojo,et al.  Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care , 2004, J. Biomed. Informatics.

[10]  Mario Cannataro,et al.  Challenges in microarray data management and analysis , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[11]  Elske Ammenwerth,et al.  Towards clinical bioinformatics: Advancing genomic medicine with informatics methods and tools - Findings from the IMIA Yearbook of Medical Informatics 2004 , 2004 .

[12]  Mario Cannataro,et al.  μ-CS: An extension of the TM4 platform to manage Affymetrix binary data , 2010, BMC Bioinformatics.

[13]  D. Venzon,et al.  Clinical pharmacology and pharmacogenetics in a genomics era: the DMET platform. , 2010, Pharmacogenomics.

[14]  Francesca Cordero,et al.  oneChannelGUI: a graphical interface to Bioconductor tools, designed for life scientists who are not familiar with R language , 2007, Bioinform..

[15]  Mario Cannataro,et al.  Single nucleotide polymorphisms of ABCC5 and ABCG1 transporter genes correlate to irinotecan-associated gastrointestinal toxicity in colorectal cancer patients: A DMET microarray profiling study , 2011, Cancer biology & therapy.

[16]  Mario Cannataro,et al.  A peroxisome proliferator-activated receptor gamma (PPARG) polymorphism is associated with zoledronic acid-related osteonecrosis of the jaw in multiple myeloma patients: analysis by DMET microarray profiling , 2011, British journal of haematology.

[17]  W. Liang,et al.  9) TM4 Microarray Software Suite , 2006 .

[18]  Michelangelo Ceci,et al.  The IS-BioBank project: a framework for biological data normalization, interoperability, and mining for cancer microenvironment analysis , 2012, SIGH.

[19]  W. Liang,et al.  TM4 microarray software suite. , 2006, Methods in enzymology.

[20]  Mario Cannataro,et al.  DMET-Analyzer: automatic analysis of Affymetrix DMET Data , 2012, BMC Bioinformatics.

[21]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[22]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[23]  Mario Cannataro,et al.  Automatic summarisation and annotation of microarray data , 2011, Soft Comput..

[24]  R Haux,et al.  Towards Clinical Bioinformatics: Advancing Genomic Medicine with Informatics Methods and Tools , 2004, Methods of Information in Medicine.

[25]  D. Basso,et al.  Integration of genomic and gene expression data of childhood ALL without known aberrations identifies subgroups with specific genetic hallmarks , 2009, Genes, chromosomes & cancer.

[26]  Mario Cannataro,et al.  Comparative analysis of nuclear estrogen receptor alpha and beta interactomes in breast cancer cells. , 2011, Molecular bioSystems.

[27]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[28]  T. Furey,et al.  Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. , 2011, Genome research.

[29]  B. Stranger,et al.  Progress and Promise of Genome-Wide Association Studies for Human Complex Trait Genetics , 2011, Genetics.

[30]  Kai Li,et al.  Visualization methods for statistical analysis of microarray clusters , 2005, BMC Bioinformatics.

[31]  Dhavendra Kumar,et al.  From evidence-based medicine to genomic medicine , 2007, Genomic Medicine.

[32]  John Darlington,et al.  EMAAS: An extensible grid-based Rich Internet Application for microarray data analysis and management , 2008, BMC Bioinformatics.

[33]  Neil D. Lawrence,et al.  puma: a Bioconductor package for propagating uncertainty in microarray analysis , 2009, BMC Bioinformatics.