Statistical Genomics

Next-generation sequencing experiment can generate billions of short reads for each sample and processing of the raw reads will add more information. Various file formats have been introduced/developed in order to store and manipulate this information. This chapter presents an overview of the file formats including FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data.

[1]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[2]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[3]  Daniel R. Salomon,et al.  Strategies for aggregating gene expression data: The collapseRows R function , 2011, BMC Bioinformatics.

[4]  Zoltan Szallasi,et al.  Jetset: selecting the optimal microarray probe set to represent a gene , 2011, BMC Bioinformatics.

[5]  Rafael A Irizarry,et al.  Frozen robust multiarray analysis (fRMA). , 2010, Biostatistics.

[6]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[7]  Junjun Zhang,et al.  BioMart Central Portal—unified access to biological data , 2009, Nucleic Acids Res..

[8]  Wolfgang Viechtbauer,et al.  Conducting Meta-Analyses in R with the metafor Package , 2010 .

[9]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[10]  L. Stein,et al.  Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome , 2012, Cancers.

[11]  Lincoln D. Stein,et al.  Pathprinting: An integrative approach to understand the functional basis of disease , 2013, Genome Medicine.

[12]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[13]  Aedín C. Culhane,et al.  GeneSigDB: a manually curated database and resource for analysis of gene expression signatures , 2011, Nucleic Acids Res..

[14]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[15]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[16]  K. Cibulskis,et al.  Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. , 2012, The Journal of clinical investigation.

[17]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[18]  Ben S. Wittner,et al.  Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 , 2009, Nature.

[19]  T. Barrette,et al.  Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. , 2007, Neoplasia.

[20]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[21]  Rainer Breitling,et al.  A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments , 2008, Bioinform..

[22]  Benjamin Frederick Ganzfried,et al.  Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. , 2014, Journal of the National Cancer Institute.

[23]  Benjamin Haibe-Kains,et al.  curatedOvarianData: clinically annotated data for the ovarian cancer transcriptome , 2013, Database J. Biol. Databases Curation.

[24]  Mark W. Lipsey,et al.  Practical Meta-Analysis , 2000 .

[25]  Hugues Bersini,et al.  inSilicoDb: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO , 2011, Bioinform..

[26]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[27]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[28]  Yidong Chen,et al.  GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus , 2008, Bioinform..

[29]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[30]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[31]  Tom Heskes,et al.  A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments , 2014, BMC Bioinformatics.

[32]  Brad T. Sherman,et al.  The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists , 2007, Genome Biology.

[33]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement , 2009, BMJ.

[34]  John N. Weinstein,et al.  Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics , 2004, BMC Bioinformatics.

[35]  Igor Jurisica,et al.  A gene signature in histologically normal surgical margins is predictive of oral carcinoma recurrence , 2011, BMC Cancer.

[36]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[37]  C. Huttenhower,et al.  Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. , 2014, Journal of the National Cancer Institute.

[38]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Franziska Michor,et al.  Most human non-GCIMP glioblastoma subtypes evolve from a common proneural-like precursor glioma. , 2014, Cancer cell.

[40]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[41]  Feng Li,et al.  An Introduction to Metaanalysis , 2005 .

[42]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[43]  Giovanni Parmigiani,et al.  A Cross-Study Comparison of Gene Expression Studies for the Molecular Classification of Lung Cancer , 2004, Clinical Cancer Research.

[44]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[45]  Debashis Ghosh,et al.  Meta-analysis based on weighted ordered P-values for genomic data with heterogeneity , 2014, BMC Bioinformatics.

[46]  I Olkin,et al.  Meta-analysis of randomized controlled trials. A concern for standards;. , 1995 .

[47]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.