Perspectives on Data Integration in Human Complex Disease Analysis

The identification of causal or predictive variants/genes/mechanisms for disease-associated traits is characterized by “complex” networks of molecular phenotypes. Present technology and computer power allow building and processing large collections of these data types. However, the super-rapid data generation is counterweighted by a slow-pace for data integration methods development. Most currently available integrative analytic tools pertain to pairing omics data and focus on between-data source relationships, making strong assumptions about within-data source architectures. A limited number of initiatives exist aiming to find the most optimal ways to analyze multiple, possibly related, omics databases, and fully acknowledge the specific characteristics of each data type. A thorough understanding of the underlying assumptions of integrative methods is needed to draw sound conclusions afterwards. In this chapter, the authors discuss how the field of “integromics” has evolved and give pointers towards essential research developments in this context.

[1]  Kristel Van Steen,et al.  Genome-wide association interaction analysis for Alzheimer's disease , 2014, Neurobiology of Aging.

[2]  Marylyn D. Ritchie,et al.  ATHENA: the analysis tool for heritable and environmental network associations , 2014, Bioinform..

[3]  Geert Molenberghs,et al.  On random sample size, ignorability, ancillarity, completeness, separability, and degeneracy: Sequential trials, random sample sizes, and missing data , 2014, Statistical methods in medical research.

[4]  Martin Posch,et al.  Detection of epistatic effects with logic regression and a classical linear regression model , 2014, Statistical applications in genetics and molecular biology.

[5]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[6]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[7]  Sheng Wang,et al.  Lynx: a database and knowledge extraction engine for integrative medicine , 2013, Nucleic Acids Res..

[8]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[9]  David L. Gibbs,et al.  Multi-omic network signatures of disease , 2013, Front. Genet..

[10]  Christian Gieger,et al.  Epigenetics meets metabolomics: an epigenome-wide association study with blood serum metabolic traits , 2013, Human molecular genetics.

[11]  Elissa J Chesler,et al.  Performing integrative functional genomics analysis in GeneWeaver.org. , 2014, Methods in molecular biology.

[12]  W. McArdle,et al.  Differences in smoking associated DNA methylation patterns in South Asians and Europeans , 2014, Clinical Epigenetics.

[13]  Marylyn D. Ritchie,et al.  ATHENA: Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network , 2013, BioData Mining.

[14]  R. Jiang,et al.  Integrating human omics data to prioritize candidate genes , 2013, BMC Medical Genomics.

[15]  J. Fawcett Thoughts About Multidisciplinary, Interdisciplinary, and Transdisciplinary Research , 2013, Nursing science quarterly.

[16]  Bart De Moor,et al.  eXtasy: variant prioritization by genomic data fusion , 2013, Nature Methods.

[17]  J. Powell,et al.  An integrated transcriptome and epigenome analysis identifies a novel candidate gene for pancreatic cancer , 2013, BMC Medical Genomics.

[18]  T. Marquès-Bonet,et al.  DNA methylation contributes to natural human variation , 2013, Genome research.

[19]  Fabian J. Theis,et al.  A modular framework for gene set analysis integrating multilevel omics data , 2013, Nucleic acids research.

[20]  Nicholas R. Lemoine,et al.  A practical guide for the functional annotation of genetic variations using SNPnexus , 2013, Briefings Bioinform..

[21]  E. Dermitzakis,et al.  Expression quantitative trait loci: present and future , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  W. Bickmore,et al.  Regulation from a distance: long-range control of gene expression in development and disease , 2013, Philosophical Transactions of the Royal Society B: Biological Sciences.

[23]  Michael Krauthammer,et al.  Complementary ensemble clustering of biomedical data , 2013, J. Biomed. Informatics.

[24]  L. Liang,et al.  Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma , 2013, Front. Genet..

[25]  E. Gamazon,et al.  Integrative Genomics: Quantifying Significance of Phenotype-Genotype Relationships from Multiple Sources of High-Throughput Data , 2013, Front. Genet..

[26]  Daniel J. Gaffney,et al.  Global Properties and Functional Complexity of Human Gene Regulatory Variation , 2013, PLoS genetics.

[27]  Tudor Groza,et al.  Getting Ready for the Human Phenome Project: The 2012 Forum of the Human Variome Project , 2013, Human mutation.

[28]  Liming Liang,et al.  A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines , 2013, Genome research.

[29]  O. Andreassen,et al.  All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs , 2013, PLoS genetics.

[30]  I. Nookaew,et al.  Integration of clinical data with a genome-scale metabolic model of the human adipocyte , 2013, Molecular systems biology.

[31]  N. Malats,et al.  Risk of Pancreatic Cancer in Breast Cancer Families from the Breast Cancer Family Registry , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[32]  Monya Baker,et al.  Big biology: The ’omes puzzle , 2013, Nature.

[33]  Keith S. Sheppard,et al.  Integration of Mouse and Human Genome-Wide Association Data Identifies KCNIP4 as an Asthma Gene , 2013, PloS one.

[34]  Andy Wing Chun Pang,et al.  Mechanisms of Formation of Structural Variation in a Fully Sequenced Human Genome , 2013, Human mutation.

[35]  J. Lupski,et al.  2012 highlights in translational 'omics , 2013, Genome Medicine.

[36]  M. Kohonen-Corish,et al.  Beyond the genomics blueprint: the 4th Human Variome Project Meeting, UNESCO, Paris, 2012 , 2013, Genetics in Medicine.

[37]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[38]  Ting Hu,et al.  Statistical Epistasis Networks Reduce the Computational Complexity of Searching Three-Locus Genetic Models , 2012, Pacific Symposium on Biocomputing.

[39]  Marylyn D. Ritchie,et al.  ATHENA: A Tool for Meta-Dimensional Analysis Applied to Genotypes and Gene Expression Data to Predict HDL Cholesterol Levels , 2012, Pacific Symposium on Biocomputing.

[40]  I. Autenrieth,et al.  The gut microflora and its variety of roles in health and disease. , 2013, Current topics in microbiology and immunology.

[41]  Yufeng J. Tseng,et al.  3Omics: a web-based systems biology tool for analysis, integration and visualization of human transcriptomic, proteomic and metabolomic data , 2013, BMC Systems Biology.

[42]  Hong Liu,et al.  Robust methods for population stratification in genome wide association studies , 2013, BMC Bioinformatics.

[43]  Chris S Haley,et al.  The genomic signature of trait-associated variants , 2013, BMC Genomics.

[44]  Aryya Gangopadhyay,et al.  Methods, Models, and Computation for Medical Informatics , 2012 .

[45]  Peter Langfelder,et al.  Genetic analysis of DNA methylation and gene expression levels in whole blood of healthy human subjects , 2012, BMC Genomics.

[46]  Wenjun Chang,et al.  Genome-wide association studies: inherent limitations and future challenges , 2012, Frontiers of Medicine.

[47]  R. Little,et al.  The prevention and treatment of missing data in clinical trials. , 2012, The New England journal of medicine.

[48]  Nicholas Eriksson,et al.  Comparison of Family History and SNPs for Predicting Risk of Complex Disease , 2012, PLoS genetics.

[49]  F. Clavel-Chapelon,et al.  Plasma antibodies to oral bacteria and risk of pancreatic cancer in a large European prospective cohort study , 2012, Gut.

[50]  Ross C. Hardison,et al.  Genome-wide Epigenetic Data Facilitate Understanding of Disease Susceptibility Association Studies* , 2012, The Journal of Biological Chemistry.

[51]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[52]  E. Raineri,et al.  Neutrality Tests for Sequences with Missing Data , 2012, Genetics.

[53]  J. Marchini,et al.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing , 2012, Nature Genetics.

[54]  Peter Kraft,et al.  Challenges and opportunities in genome-wide environmental interaction (GWEI) studies , 2012, Human Genetics.

[55]  Jeffrey T. Leek,et al.  A statistical approach to selecting and confirming validation targets in -omics experiments , 2012, BMC Bioinformatics.

[56]  Yann Le Strat,et al.  Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data , 2012, BMC Medical Research Methodology.

[57]  Nicholas R. Lemoine,et al.  SNPnexus: a web server for functional annotation of novel and publicly known genetic variants (2012 update) , 2012, Nucleic Acids Res..

[58]  G. Weinstock,et al.  Emerging view of the human virome , 2012, Translational Research.

[59]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[60]  Wei Wang,et al.  Rapid and Robust Resampling-Based Multiple-Testing Correction with Application in a Genome-Wide Expression Quantitative Trait Loci Study , 2012, Genetics.

[61]  J. Clemente,et al.  The Impact of the Gut Microbiota on Human Health: An Integrative View , 2012, Cell.

[62]  E. Marcotte,et al.  Insights into the regulation of protein abundance from proteomic and transcriptomic analyses , 2012, Nature Reviews Genetics.

[63]  A. Goy,et al.  Proactive Biobanking to Improve Research and Health Care , 2012 .

[64]  J. A. Riancho Enfermedades complejas y análisis genéticos por el método GWAS. Ventajas y limitaciones , 2012 .

[65]  Richard C. Davis,et al.  A systems genetic analysis of high density lipoprotein metabolism and network preservation across mouse models. , 2012, Biochimica et biophysica acta.

[66]  J. Riancho Genome-wide association studies (GWAS) in complex diseases: advantages and limitations. , 2012, Reumatologia clinica.

[67]  Eun-Youn Kim,et al.  Multiscale ensemble clustering for finding modules in complex networks. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[68]  F. Vannberg,et al.  GENETICS OF GENE EXPRESSION IN PRIMARY IMMUNE CELLS IDENTIFIES CELL-SPECIFIC MASTER REGULATORS AND ROLES OF HLA ALLELES , 2012, Nature Genetics.

[69]  H. Risch Pancreatic cancer: Helicobacter pylori colonization, N‐Nitrosamine exposures, and ABO blood group , 2012, Molecular carcinogenesis.

[70]  R. Ophoff,et al.  Unraveling the Regulatory Mechanisms Underlying Tissue-Dependent Genetic Variation of Gene Expression , 2012, PLoS genetics.

[71]  Wen Tan,et al.  Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations , 2011, Nature Genetics.

[72]  Hyungwon Choi,et al.  When One and One Gives More than Two: Challenges and Opportunities of Integrative Omics , 2011, Front. Gene..

[73]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[74]  K. Yap The Evolving Role of Pharmacoinformatics in Targeting Drug-Related Problems in Clinical Oncology Practice , 2012 .

[75]  K. Mills,et al.  Microarray for epigenetic changes: gene expression arrays. , 2012, Methods in molecular biology.

[76]  N. Rodríguez‐Ezpeleta,et al.  Bioinformatics for High Throughput Sequencing , 2012, Springer New York.

[77]  Maarten Postma,et al.  Pharmacoinformatics and drug discovery technologies: Theories and applications , 2012 .

[78]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions , 2012, Briefings Bioinform..

[79]  Kim-Anh Lê Cao,et al.  Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets , 2012, BMC Bioinformatics.

[80]  Christophe Ambroise,et al.  Accounting for Population Stratification in Practice: A Comparison of the Main Strategies Dedicated to Genome-Wide Association Studies , 2011, PloS one.

[81]  N. Malats,et al.  Pancreatic cancer risk and levels of trace elements , 2011, Gut.

[82]  John P A Ioannidis,et al.  Improving Validation Practices in “Omics” Research , 2011, Science.

[83]  T. Takagi,et al.  Functional Interpretation of Omics Data by Profiling Genes and Diseases Using MeSH–Controlled Vocabulary , 2011 .

[84]  Kerrie L. Mengersen,et al.  Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[85]  J. Marchini,et al.  Genotype Imputation with Thousands of Genomes , 2011, G3: Genes | Genomes | Genetics.

[86]  M. Blaser,et al.  Microbiome and malignancy. , 2011, Cell host & microbe.

[87]  D. Chia,et al.  Variations of oral microbiota are associated with pancreatic diseases including pancreatic cancer , 2011, Gut.

[88]  Jason H. Moore,et al.  Systems genetics for drug target discovery. , 2011, Trends in pharmacological sciences.

[89]  G. Page,et al.  The Influence of Errors Inherent in Genome Wide Association Studies (GWAS) in Relation To Single Gene Models , 2011 .

[90]  Christian Gieger,et al.  The Use of Genome-Wide eQTL Associations in Lymphoblastoid Cell Lines to Identify Novel Genetic Pathways Involved in Complex Traits , 2011, PloS one.

[91]  Jianjun Qiao,et al.  Prediction and Characterization of Missing Proteomic Data in Desulfovibrio vulgaris , 2011, Comparative and functional genomics.

[92]  Inke R. König,et al.  Validation in Genetic Association Studies , 2011, Briefings Bioinform..

[93]  A. Iwasaki,et al.  Genome–virome interactions: examining the role of common viral infections in complex disease , 2011, Nature Reviews Microbiology.

[94]  Richard P. Dutton,et al.  Bring Out Your Data: The Evolution of the National Anesthesia Clinical Outcomes Registry (NACOR) , 2011, Int. J. Comput. Model. Algorithms Medicine.

[95]  Mariza de Andrade,et al.  Leukocyte DNA Methylation Signature Differentiates Pancreatic Cancer Patients from Healthy Controls , 2011, PloS one.

[96]  A. Tenenhaus,et al.  Regularized Generalized Canonical Correlation Analysis , 2011, Eur. J. Oper. Res..

[97]  M. Vihinen,et al.  Genetic tests need the Human Variome Project. , 2011, Genetic testing and molecular biomarkers.

[98]  T. Spector,et al.  The effect of genome-wide association scan quality control on imputation outcome for common variants , 2011, European Journal of Human Genetics.

[99]  N. Pace,et al.  Disease phenotype and genotype are associated with shifts in intestinal‐associated microbiota in inflammatory bowel diseases , 2011, Inflammatory bowel diseases.

[100]  Raymond J Carroll,et al.  Local and omnibus goodness‐of‐fit tests in classical measurement error models , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[101]  Gert Mayer,et al.  Omics-bioinformatics in the context of clinical data. , 2011, Methods in molecular biology.

[102]  S. Orchard,et al.  Omics technologies, data and bioinformatics principles. , 2011, Methods in molecular biology.

[103]  Bernd Mayer,et al.  Bioinformatics for Omics Data , 2011, Methods in Molecular Biology.

[104]  Raymond J Carroll,et al.  Semiparametric Bayesian analysis of gene-environment interactions with error in measurement of environmental covariates and missing genetic data. , 2011, Statistics and its interface.

[105]  Joaquim F. Pinto da Costa,et al.  A Weighted Principal Component Analysis and Its Application to Gene Expression Data , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[106]  Philippe Besse,et al.  Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems , 2011, BMC Bioinformatics.

[107]  Xiao-Lin Wu,et al.  Inferring causal phenotype networks using structural equation models , 2011, Genetics Selection Evolution.

[108]  Heidi Ledford,et al.  New year, new science , 2010, Nature.

[109]  Grier Page,et al.  Genetic inheritance and Genome Wide Association statistical test performance , 2010 .

[110]  Hiroshi Tanaka,et al.  iCOD : an integrated clinical omics database based on the systems-pathology view of disease , 2010, BMC Genomics.

[111]  Ruzong Fan,et al.  Genotype‐based association mapping of complex diseases: gene‐environment interactions with multiple genetic markers and measurement error in environmental exposures , 2010, Genetic epidemiology.

[112]  Piotr Fryzlewicz,et al.  Wavelet methods , 2010 .

[113]  Konstantinos G. Margaritis,et al.  An Optimal Scaling Approach to Collaborative Filtering Using Categorical Principal Component Analysis and Neighborhood Formation , 2010, AIAI.

[114]  Leonardo Franco,et al.  Missing data imputation using statistical and machine learning methods in a real breast cancer problem , 2010, Artif. Intell. Medicine.

[115]  Shaohua Zhang,et al.  Efficient Mining Frequent Closed Discriminative Biclusters by Sample-Growth: The FDCluster Approach , 2010, Int. J. Knowl. Discov. Bioinform..

[116]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[117]  Marylyn D. Ritchie,et al.  ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci , 2010, BioData Mining.

[118]  T. Pastinen Genome-wide allele-specific analysis: insights into regulatory variation , 2010, Nature Reviews Genetics.

[119]  Yves Moreau,et al.  Large-scale benchmark of Endeavour using MetaCore maps , 2010, Bioinform..

[120]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[121]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[122]  Wei Wang,et al.  Discriminative Subgraph Mining for Protein Classification , 2010, Int. J. Knowl. Discov. Bioinform..

[123]  Martin Sill,et al.  SEURAT: Visual analytics for the integrated analysis of microarray data , 2010, BMC Medical Genomics.

[124]  R. Cotton,et al.  Reducing the burden of inherited disease: the Human Variome Project , 2010, The Medical journal of Australia.

[125]  John Wei,et al.  Towards a comprehensive structural variation map of an individual human genome , 2010, Genome Biology.

[126]  Wei Zheng,et al.  Anthropometric measures, body mass index, and pancreatic cancer: a pooled analysis from the Pancreatic Cancer Cohort Consortium (PanScan). , 2010, Archives of internal medicine.

[127]  D. Moher,et al.  Transparent and accurate reporting increases reliability, utility, and impact of your research: reporting guidelines and the EQUATOR Network , 2010, BMC medicine.

[128]  Vincent Frouin,et al.  Gene Association Networks from Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[129]  Wei Zheng,et al.  A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33 , 2010, Nature Genetics.

[130]  Eng-King Tan,et al.  Genome-wide association studies: promises and pitfalls. , 2010, Annals of the Academy of Medicine, Singapore.

[131]  Geoffrey S. Tobias,et al.  Pancreatic cancer risk and ABO blood group alleles: results from the pancreatic cancer cohort consortium. , 2010, Cancer research.

[132]  Weiwen Zhang,et al.  Integrating multiple 'omics' analysis for microbial biology: application and methodologies. , 2010, Microbiology.

[133]  Mehmet Koyutürk,et al.  An Integrative -omics Approach to Identify Functional Sub-Networks in Human Colorectal Cancer , 2010, PLoS Comput. Biol..

[134]  Giovanni C. Porzio,et al.  Mining performance data through nonlinear PCA with optimal scaling , 2010 .

[135]  Holger Schwender,et al.  Logic regression and its extensions. , 2010, Advances in genetics.

[136]  H. Shatkay,et al.  Functionally informative tag SNP selection using a Pareto-optimal approach. , 2010, Advances in experimental medicine and biology.

[137]  Vikas Singh,et al.  Ensemble clustering using semidefinite programming with applications , 2010, Machine Learning.

[138]  Supriyo De,et al.  Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information , 2010, BMC Medical Genomics.

[139]  A Ziegler,et al.  Detecting SNP‐expression associations: A comparison of mutual information and median test with standard statistical approaches , 2009, Statistics in medicine.

[140]  G. Satten,et al.  Effect of population stratification on the identification of significant single-nucleotide polymorphisms in genome-wide association studies , 2009, BMC proceedings.

[141]  R. Shamir,et al.  Towards accurate imputation of quantitative genetic interactions , 2009, Genome Biology.

[142]  Kazuki Saito,et al.  Integrated omics approaches in plant systems biology. , 2009, Current opinion in chemical biology.

[143]  Mauno Vihinen,et al.  Capturing all disease-causing mutations for clinical and research use: Toward an effortless system for the Human Variome Project , 2009, Genetics in Medicine.

[144]  Krishna R. Kalari,et al.  Gemcitabine and Arabinosylcytosin Pharmacogenomics: Genome-Wide Association and Drug Response Biomarkers , 2009, PloS one.

[145]  Guiying Zhang,et al.  Integration of data mining into a nonlinear experimental design approach for improved performance , 2009 .

[146]  John Wilbanks,et al.  'Omics Data Sharing , 2009, Science.

[147]  Aeilko H. Zwinderman,et al.  Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks , 2009, BMC Bioinformatics.

[148]  W. Foulkes,et al.  Analysis of the gene coding for the BRCA2-interacting protein PALB2 in familial and sporadic pancreatic cancer. , 2009, Gastroenterology.

[149]  Geoffrey S. Tobias,et al.  Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer , 2009, Nature Genetics.

[150]  Raymond J Carroll,et al.  Nonparametric Prediction in Measurement Error Models , 2009, Journal of the American Statistical Association.

[151]  Daniel A. Skelly,et al.  Inherited variation in gene expression. , 2009, Annual review of genomics and human genetics.

[152]  Andre Franke,et al.  Current software for genotype imputation , 2009, Human Genomics.

[153]  Mauro Leoncini,et al.  K-Boost: A Scalable Algorithm for High-Quality Clustering of Microarray Gene Expression Data , 2009, J. Comput. Biol..

[154]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[155]  Philippe Besse,et al.  Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis , 2009 .

[156]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[157]  Alvis Brazma,et al.  Minimum Information About a Microarray Experiment (MIAME) – Successes, Failures, Challenges , 2009, TheScientificWorldJournal.

[158]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[159]  Alison P. Klein,et al.  Exomic Sequencing Identifies PALB2 as a Pancreatic Cancer Susceptibility Gene , 2009, Science.

[160]  Hagit Shatkay,et al.  An integrative scoring system for ranking SNPs by their potential deleterious effects , 2009, Bioinform..

[161]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[162]  C. Greenwood,et al.  Data Integration in Genetics and Genomics: Methods and Challenges , 2009, Human genomics and proteomics : HGP.

[163]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[164]  David A. Drubin,et al.  Learning a Prior on Regulatory Potential from eQTL Data , 2009, PLoS genetics.

[165]  Arshad Khan,et al.  SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms , 2008, Bioinform..

[166]  Theodore B. Trafalis,et al.  Missing Data Imputation Through Machine Learning Algorithms , 2009 .

[167]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[168]  Arnab Maity,et al.  SIMEX and standard error estimation in semiparametric measurement error models. , 2009, Electronic journal of statistics.

[169]  Steve Horvath,et al.  Network module detection: Affinity search technique with the multi-node topological overlap measure , 2009, BMC Research Notes.

[170]  J. Suykens,et al.  A kernel-based integration of genome-wide data for clinical decision support , 2009, Genome Medicine.

[171]  D. Roukos Personal Genomics and Genome-Wide Association Studies: Novel Discoveries but Limitations for Practical Personalized Medicine , 2009, Annals of Surgical Oncology.

[172]  Philippe Besse,et al.  Sparse canonical methods for biological data integration: application to a cross-platform study , 2009, BMC Bioinformatics.

[173]  Andrew D. Johnson,et al.  Bmc Medical Genetics an Open Access Database of Genome-wide Association Results , 2009 .

[174]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[175]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[176]  Raymond J Carroll,et al.  A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression , 2008, Statistics in medicine.

[177]  Sue Povey,et al.  The Human Variome Project , 2008, Science.

[178]  Hagit Shatkay,et al.  Ranking single nucleotide polymorphisms by potential deleterious effects , 2008, BMC Bioinformatics.

[179]  Sharon R. Browning,et al.  Missing data imputation and haplotype phase inference for genome-wide association studies , 2008, Human Genetics.

[180]  Jonathan Marchini,et al.  Comparing algorithms for genotype imputation. , 2008, American journal of human genetics.

[181]  C. Morris The EQUATOR Network: promoting the transparent and accurate reporting of research , 2008, Developmental medicine and child neurology.

[182]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[183]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[184]  Bart De Moor,et al.  Endeavour update: a web resource for gene prioritization in multiple species , 2008, Nucleic Acids Res..

[185]  Marylyn D Ritchie,et al.  Comparison of approaches for machine‐learning optimization of neural networks for detecting gene‐gene interactions in genetic epidemiology , 2008, Genetic epidemiology.

[186]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[187]  Thorsten Henrich,et al.  Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE) , 2008, Nature Biotechnology.

[188]  A. G. de la Fuente,et al.  Gene Network Inference via Structural Equation Modeling in Genetical Genomics Experiments , 2008, Genetics.

[189]  Hagit Shatkay,et al.  F-SNP: computationally predicted functional SNPs for disease association studies , 2007, Nucleic Acids Res..

[190]  Carol Edwards,et al.  Integration of Genomic and Medical Data into a 3D Atlas of Human Anatomy , 2008, MMVR.

[191]  Holger Schwender,et al.  Identification of SNP interactions using logic regression. , 2008, Biostatistics.

[192]  Xin Li,et al.  Efficient haplotype inference from pedigrees with missing data using linear systems with disjoint-set data structures. , 2008, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[193]  Steve Horvath,et al.  Using genetic markers to orient the edges in quantitative trait networks: The NEO software , 2008, BMC Systems Biology.

[194]  N. Woods Multidisciplinary, Interdisciplinary and Transdisciplinary Approaches to Women's Health Research: A View from the Seattle Midlife Women's Health Study* , 2007 .

[195]  Juha Karhunen,et al.  Principal Component Analysis for Sparse High-Dimensional Data , 2007, ICONIP.

[196]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[197]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[198]  Francesco Falciani,et al.  DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research , 2007, Molecular medicine.

[199]  Jae Won Lee,et al.  Ensemble clustering method based on the resampling similarity measure for gene expression data. , 2007, Statistical methods in medical research.

[200]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[201]  Eric E. Schadt,et al.  Moving toward a system genetics view of disease , 2007, Mammalian Genome.

[202]  Tijl De Bie,et al.  Kernel-based data fusion for gene prioritization , 2007, ISMB/ECCB.

[203]  K. Witkiewitz,et al.  Methods for Handling Missing Data in the Behavioral Neurosciences: Don’t Throw the Baby Rat out with the Bath Water , 2007, Journal of undergraduate neuroscience education : JUNE : a publication of FUN, Faculty for Undergraduate Neuroscience.

[204]  A. Geiser,et al.  DNA microarray data integration by ortholog gene analysis reveals potential molecular mechanisms of estrogen-dependent growth of human uterine fibroids , 2007, BMC women's health.

[205]  R. Cotton Recommendations of the 2006 Human Variome Project meeting , 2007, Nature Genetics.

[206]  G Molenberghs,et al.  Approaches to Handling Incomplete Data in Family‐based Association Testing , 2007, Annals of human genetics.

[207]  Steve Horvath,et al.  Network neighborhood analysis with the multi-node topological overlap measure , 2007, Bioinform..

[208]  Gang Wu,et al.  Integrative Analysis of Transcriptomic and Proteomic Data: Challenges, Solutions and Applications , 2007, Critical reviews in biotechnology.

[209]  E. Giovannucci,et al.  A prospective study of periodontal disease and pancreatic cancer in US male health professionals. , 2006, Journal of the National Cancer Institute.

[210]  Joachim Selbig,et al.  Integrated data analysis for genome-wide research. , 2007, EXS.

[211]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[212]  Kathleen Marchal,et al.  Integration of omics data: how well does it work for bacteria? , 2006, Molecular microbiology.

[213]  P. Kwok,et al.  Human Variome Project: an international collaboration to catalogue human genetic variation. , 2006, Pharmacogenomics.

[214]  Ronald C. Taylor,et al.  Development of the Minimum Information Specification for In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). , 2006, Omics : a journal of integrative biology.

[215]  John Quackenbush From 'omes to biology. , 2006, Animal genetics.

[216]  Hagit Shatkay,et al.  BNTagger: improved tagging SNP selection using Bayesian networks , 2006, ISMB.

[217]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[218]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[219]  L. Feuk,et al.  Structural variation in the human genome , 2006, Nature Reviews Genetics.

[220]  Ping Wang,et al.  A review of statistical methods for expression quantitative trait loci mapping , 2006, Mammalian Genome.

[221]  A. Murray,et al.  Investigations into the influence of host genetics on the predominant eubacteria in the faecal microflora of children. , 2005, Journal of medical microbiology.

[222]  Kwong-Sak Leung,et al.  Scalable model-based clustering for large databases based on data summarization , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[223]  C. Wild Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology , 2005, Cancer Epidemiology Biomarkers & Prevention.

[224]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[225]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[226]  M. Woodward,et al.  Type-II diabetes and pancreatic cancer: a meta-analysis of 36 studies , 2005, British Journal of Cancer.

[227]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[228]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[229]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[230]  I. Gut,et al.  Future potential of the Human Epigenome Project , 2004, Expert review of molecular diagnostics.

[231]  A. Chakravarti,et al.  Haplotype and missing data inference in nuclear families. , 2004, Genome research.

[232]  M Y Wong,et al.  Estimation of magnitude in gene–environment interactions in the presence of measurement error , 2004, Statistics in medicine.

[233]  Raymond J Carroll,et al.  A New Method for Dealing with Measurement Error in Explanatory Variables of Regression Models , 2004, Biometrics.

[234]  Steven N. Thorsen,et al.  Fusion or Integration: What's the Difference? , 2004 .

[235]  M. Blaser,et al.  Tooth loss, pancreatic cancer, and Helicobacter pylori. , 2003, The American journal of clinical nutrition.

[236]  A. Ziegler,et al.  BRCA2 germline mutations in familial pancreatic carcinoma. , 2003, Journal of the National Cancer Institute.

[237]  N E Day,et al.  The detection of gene-environment interaction for continuous traits: should we deal with measurement error by bigger studies or better measurement? , 2003, International journal of epidemiology.

[238]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[239]  Marcos M. Campos,et al.  O-Cluster: scalable clustering of large high dimensional data sets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[240]  N. Malats,et al.  Family history of cancer and germline BRCA2 mutations in sporadic exocrine pancreatic cancer , 2002, Gut.

[241]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[242]  D B Rubin,et al.  Multiple Imputation for Multivariate Data with Missing and Below‐Threshold Measurements: Time‐Series Concentrations of Pollutants in the Arctic , 2001, Biometrics.

[243]  M. Kenward,et al.  Sensitivity analysis for incomplete contingency tables: the Slovenian plebiscite case , 2001 .

[244]  J. Kaprio,et al.  Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. , 2000, The New England journal of medicine.

[245]  M Y Wong,et al.  Measurement error in epidemiology: the design of validation studies II: bivariate situation. , 1999, Statistics in medicine.

[246]  R J Carroll,et al.  Flexible Parametric Measurement Error Models , 1999, Biometrics.

[247]  D. Bouchez,et al.  Functional genomics in plants. , 1998, Plant physiology.

[248]  Robert P. Goldman,et al.  Imputation of Missing Data Using Machine Learning Techniques , 1996, KDD.

[249]  Jérôme Pagès,et al.  Multiple factor analysis (AFMULT package) , 1994 .

[250]  R. Carroll,et al.  Measurement error, instrumental variables and corrections for attenuation with applications to meta-analyses. , 1994, Statistics in medicine.

[251]  C. la Vecchia,et al.  Family history and the risk of liver, gallbladder, and pancreatic cancer. , 1994, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[252]  D. Clayton,et al.  Measurement error: effects and remedies in nutritional epidemiology , 1994, Proceedings of the Nutrition Society.

[253]  A. Andrén-sandberg,et al.  Pancreatitis and the risk of pancreatic cancer , 1993 .

[254]  Stephen Senn,et al.  Covariance analysis in generalized linear measurement error models. , 1989, Statistics in medicine.

[255]  Antonio Ciampi,et al.  Recursive Partition: A Versatile Method for Exploratory-Data Analysis in Biostatistics , 1987 .

[256]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[257]  K J Rothman,et al.  Synergy and antagonism in cause-effect relationships. , 1974, American journal of epidemiology.

[258]  J. Tukey,et al.  Multiple-Factor Analysis , 1947 .

[259]  International Journal of Knowledge Discovery in Bioinformatics , 2022 .