Network Modeling of Complex Data Sets.

We demonstrate a selection of network and machine learning techniques useful in the analysis of complex datasets, including 2-way similarity networks, Markov clustering, enrichment statistical networks, FCROS differential analysis, and random forests. We demonstrate each of these techniques on the Populus trichocarpa gene expression atlas.

[1]  Shuifang Zhu,et al.  Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads , 2014, BMC Bioinformatics.

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[4]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[5]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Sharlee Climer,et al.  A Custom Correlation Coefficient (CCC) Approach for Fast Identification of Multi‐SNP Association Patterns in Genome‐Wide SNPs Data , 2014, Genetic epidemiology.

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Daniel A. Jacobson,et al.  Synchronized genetic activities in Alzheimer’s brains revealed by heterogeneity-capturing network analysis , 2020, bioRxiv.

[10]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[11]  R. Balakrishnan,et al.  A textbook of graph theory , 1999 .

[12]  K. Vandepoele,et al.  Comparative co-expression analysis in plant biology. , 2012, Plant, cell & environment.

[13]  João Ricardo Sato,et al.  Comparing Pearson, Spearman and Hoeffding's d Measure for Gene Expression Association Analysis , 2009, J. Bioinform. Comput. Biol..

[14]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[15]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[16]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[17]  Hao Yu,et al.  Programming with Big Data – Interface to MPI , 2016 .

[18]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[19]  Qi Zheng,et al.  GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis , 2008, Nucleic Acids Res..

[20]  Weixiong Zhang,et al.  Allele-Specific Network Reveals Combinatorial Interaction That Transcends Small Effects in Psoriasis GWAS , 2014, PLoS Comput. Biol..

[21]  Doulaye Dembélé,et al.  Fold change rank ordering statistics: a new method for detecting differentially expressed genes , 2014, BMC Bioinformatics.

[22]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[23]  Sa Bloom,et al.  Similarity Indices in Community Studies: Potential Pitfalls , 1981 .

[24]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[25]  Deborah A. Weighill,et al.  Network Metamodeling: Effect of Correlation Metric Choice on Phylogenomic and Transcriptomic Network Topology. , 2017, Advances in biochemical engineering/biotechnology.

[26]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[27]  Deborah A. Weighill,et al.  3-way Networks: Application of Hypergraphs for Modelling Increased Complexity in Comparative Genomics , 2015, PLoS Comput. Biol..

[28]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .