Inferring biosynthetic and gene regulatory networks from Artemisia annua RNA sequencing data on a credit card-sized ARM computer.

Prediction of gene function and gene regulatory networks is one of the most active topics in bioinformatics. The accumulation of publicly available gene expression data for hundreds of plant species, together with advances in bioinformatical methods and affordable computing, sets ingenuity as one of the major bottlenecks in understanding gene function and regulation. Here, we show how a credit card-sized computer retailing for <50 USD can be used to rapidly predict gene function and infer regulatory networks from RNA sequencing data. To achieve this, we constructed a bioinformatical pipeline that downloads and allows quality-control of RNA sequencing data; and generates a gene co-expression network that can reveal enzymes and transcription factors participating and controlling a given biosynthetic pathway. We exemplify this by first identifying genes and transcription factors involved in the biosynthesis of secondary cell wall in the plant Artemisia annua, the main natural source of the anti-malarial drug artemisinin. Networks were then used to dissect the artemisinin biosynthesis pathway, which suggest potential transcription factors regulating artemisinin biosynthesis. We provide the source code of our pipeline (https://github.com/mutwil/LSTrAP-Lite) and envision that the ubiquity of affordable computing, availability of biological data and increased bioinformatical training of biologists will transform the field of bioinformatics. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.

[1]  R. Zhong,et al.  Mutations of Arabidopsis TBL32 and TBL33 Affect Xylan Acetylation and Secondary Wall Deposition , 2016, PloS one.

[2]  E. Marcotte,et al.  Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana , 2010, Nature Biotechnology.

[3]  Rossana Henriques,et al.  Agrobacterium-mediated transformation of Arabidopsis thaliana using the floral dip method , 2006, Nature Protocols.

[4]  Saranyan K. Palaniswamy,et al.  AGRIS and AtRegNet. A Platform to Link cis-Regulatory Elements and Transcription Factors into Regulatory Networks1[W][OA] , 2006, Plant Physiology.

[5]  Shujing Liu,et al.  Characterization of MADS-domain transcription factor complexes in Arabidopsis flower development , 2012, Proceedings of the National Academy of Sciences.

[6]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[7]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[8]  Gary D. Bader,et al.  GeneMANIA update 2018 , 2018, Nucleic Acids Res..

[9]  Evan O. Paull,et al.  A Community Challenge for Inferring Genetic Predictors of Gene Essentialities through Analysis of a Functional Screen of Cancer Cell Lines. , 2017, Cell systems.

[10]  Z. Stachurski,et al.  Fasciclin-like arabinogalactan proteins: specialization for stem biomechanics and cell wall architecture in Arabidopsis and Eucalyptus. , 2010, The Plant journal : for cell and molecular biology.

[11]  Staffan Persson,et al.  The cell biology of cellulose synthesis. , 2014, Annual review of plant biology.

[12]  R. Ridley Malaria: To kill a parasite , 2003, Nature.

[13]  Z. Nikoloski,et al.  Ensemble gene function prediction database reveals genes important for complex I formation in Arabidopsis thaliana. , 2018, The New phytologist.

[14]  S. Rhee,et al.  Towards revealing the functions of all genes in plants. , 2014, Trends in plant science.

[15]  Heng Zhu,et al.  Systematic characterization of protein-DNA interactions , 2011, Cellular and Molecular Life Sciences.

[16]  B. Ellis,et al.  Overview of OVATE FAMILY PROTEINS, A Novel Class of Plant-Specific Growth Regulators , 2016, Front. Plant Sci..

[17]  C. Halpin Cell Biology: Up Against the Wall , 2013, Current Biology.

[18]  Ralph Bock,et al.  Lighting the Way to Protein-Protein Interactions: Recommendations on Best Practices for Bimolecular Fluorescence Complementation Analyses[OPEN] , 2016, Plant Cell.

[19]  Robert J. Schmitz,et al.  Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes , 2016, Nucleic acids research.

[20]  K. Tang,et al.  New insights into artemisinin regulation , 2017, Plant signaling & behavior.

[21]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[22]  A. Fernie,et al.  FamNet: A Framework to Identify Multiplied Modules Driving Pathway Expansion in Plants1 , 2016, Plant Physiology.

[23]  P. K. Ajikumar,et al.  The future of metabolic engineering and synthetic biology: towards a systematic practice. , 2012, Metabolic engineering.

[24]  J. Keasling,et al.  High-level semi-synthetic production of the potent antimalarial artemisinin , 2013, Nature.

[25]  Raluca Gordân,et al.  Distinguishing direct versus indirect transcription factor-DNA interactions. , 2009, Genome research.

[26]  J. Bowman,et al.  Cellulose Synthesis - Central Components and Their Evolutionary Relationships. , 2019, Trends in plant science.

[27]  D. Inzé,et al.  The DNA replication checkpoint aids survival of plants deficient in the novel replisome factor ETG1 , 2008, The EMBO journal.

[28]  Gabriel Krouk,et al.  The Next Generation of Training for Arabidopsis Researchers: Bioinformatics and Quantitative Biology1 , 2017, Plant Physiology.

[29]  Jesse Gillis,et al.  Progress and challenges in the computational prediction of gene function using networks , 2012, F1000Research.

[30]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[31]  Mathew G. Lewsey,et al.  Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape , 2016, Cell.

[32]  R. Dixon,et al.  Current Models for Transcriptional Regulation of Secondary Cell Wall Biosynthesis in Grasses , 2018, Front. Plant Sci..

[33]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[34]  Laura Baxter,et al.  Conserved Noncoding Sequences Highlight Shared Components of Regulatory Networks in Dicotyledonous Plants[W][OA] , 2012, Plant Cell.

[35]  Staffan Persson,et al.  Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. , 2009, Plant, cell & environment.

[36]  Jason A. Corwin,et al.  An Arabidopsis Gene Regulatory Network for Secondary Cell Wall Synthesis , 2014, Nature.

[37]  J. Liu STRUCTURE AND REACTION OF ARTEANNUIN , 1979 .

[38]  H. Scheller,et al.  An Integrative Approach to the Identification of Arabidopsis and Rice Genes Involved in Xylan and Secondary Wall Development , 2010, PloS one.

[39]  A. Loraine,et al.  Assembly of an Interactive Correlation Network for the Arabidopsis Genome Using a Novel Heuristic Clustering Algorithm1[W] , 2009, Plant Physiology.

[40]  Edith M. Ross,et al.  Regulators of genetic risk of breast cancer identified by integrative network analysis , 2015, Nature Genetics.

[41]  Bernard Henrissat,et al.  Biosynthesis of cellulose-enriched tension wood in Populus: global analysis of transcripts and metabolites identifies biochemical and developmental regulators in secondary wall biosynthesis. , 2006, The Plant journal : for cell and molecular biology.

[42]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[43]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[44]  D. Delmer,et al.  TRICHOME BIREFRINGENCE and Its Homolog AT5G01360 Encode Plant-Specific DUF231 Proteins Required for Cellulose Biosynthesis in Arabidopsis1[W][OA] , 2010, Plant Physiology.

[45]  Mark Stitt,et al.  Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data. , 2014, Plant, cell & environment.

[46]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[47]  Magali Schnell Ramos,et al.  Toward the Storage Metabolome: Profiling the Barley Vacuole1[W][OA] , 2011, Plant Physiology.

[48]  R. Zhong,et al.  Evolutionary conservation of the transcriptional network regulating secondary cell wall biosynthesis. , 2010, Trends in plant science.

[49]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[50]  Alan M. Moses,et al.  An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions , 2013, Nature Genetics.

[51]  Seung Y. Rhee,et al.  Genomic Signatures of Specialized Metabolism in Plants , 2014, Science.

[52]  Y. Tu The discovery of artemisinin (qinghaosu) and gifts from Chinese medicine , 2011, Nature Medicine.

[53]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[54]  B. Usadel,et al.  PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA] , 2011, Plant Cell.

[55]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[56]  Liangjiang Wang,et al.  A Genome-Wide Scenario of Terpene Pathways in Self-pollinated Artemisia annua. , 2015, Molecular plant.

[57]  A. Bacic,et al.  KNAT7 positively regulates xylan biosynthesis by directly activating IRX9 expression in Arabidopsis. , 2018, Journal of integrative plant biology.

[58]  N. Provart,et al.  Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon. , 2017, The New phytologist.

[59]  David A. Fidock,et al.  Chloroquine Resistance in Plasmodium falciparum Malaria Parasites Conferred by pfcrt Mutations , 2002, Science.

[60]  K. Vandepoele,et al.  Systematic Identification of Functional Plant Modules through the Integration of Complementary Data Sources1[W][OA] , 2012, Plant Physiology.

[61]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[62]  A. Seiter,et al.  A global subsidy for antimalarial drugs. , 2007, The American journal of tropical medicine and hygiene.

[63]  A. Bonner,et al.  Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions , 2011, Proceedings of the National Academy of Sciences.

[64]  P. Dupree,et al.  Two members of the DUF579 family are responsible for arabinogalactan methylation in Arabidopsis , 2019, Plant direct.

[65]  V. Seltzer,et al.  A coumaroyl-ester-3-hydroxylase Insertion Mutant Reveals the Existence of Nonredundant meta-Hydroxylation Pathways and Essential Roles for Phenolic Precursors in Cell Expansion and Plant Growth1[W][OA] , 2005, Plant Physiology.

[66]  Toshihiro Mita,et al.  Evolution of Plasmodium falciparum drug resistance: implications for the development and containment of artemisinin resistance. , 2012, Japanese journal of infectious diseases.

[67]  J. Maloof,et al.  Network Analysis Identifies ELF3 as a QTL for the Shade Avoidance Response in Arabidopsis , 2010, PLoS genetics.

[68]  K. Vandepoele,et al.  Inference of Transcriptional Networks in Arabidopsis through Conserved Noncoding Sequence Analysis[C][W] , 2014, Plant Cell.

[69]  M. Bulyk,et al.  Transcription factor-DNA binding: beyond binding site motifs. , 2017, Current opinion in genetics & development.

[70]  Ian R. Castleden,et al.  SUBA3: a database for integrating experimentation and prediction to define the SUBcellular location of proteins in Arabidopsis , 2012, Nucleic Acids Res..

[71]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[72]  D. Bouchez,et al.  Arabidopsis gene knockout: phenotypes wanted. , 2001, Current opinion in plant biology.

[73]  Insuk Lee,et al.  AraNet: A Network Biology Server for Arabidopsis thaliana and Other Non-Model Plant Species. , 2017, Methods in molecular biology.

[74]  B. Frey,et al.  Down-regulation of UDP-glucuronic Acid Biosynthesis Leads to Swollen Plant Cell Walls and Severe Developmental Defects Associated with Changes in Pectic Polysaccharides* , 2011, The Journal of Biological Chemistry.

[75]  M. Mutwil,et al.  Tools of the trade: studying molecular networks in plants. , 2016, Current opinion in plant biology.

[76]  Klaas Vandepoele,et al.  TF2Network: predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information , 2017, bioRxiv.

[77]  Andrea Califano,et al.  ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information , 2016, Bioinform..

[78]  M. Pauly,et al.  Xylan O-acetylation impacts xylem development and enzymatic recalcitrance as indicated by the Arabidopsis mutant tbl29. , 2013, Molecular plant.

[79]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[80]  Nana Chen,et al.  Synchronization of Developmental Processes and Defense Signaling by Growth Regulating Transcription Factors , 2014, PloS one.

[81]  Frank Van Breusegem,et al.  Arabidopsis Ensemble Reverse-Engineered Gene Regulatory Network Discloses Interconnected Transcription Factors in Oxidative Stress[W] , 2014, Plant Cell.

[82]  F. Sato,et al.  Three novel subunits of Arabidopsis chloroplastic NAD(P)H dehydrogenase identified by bioinformatic and reverse genetic approaches. , 2009, The Plant journal : for cell and molecular biology.

[83]  Molly Megraw,et al.  A stele-enriched gene regulatory network in the Arabidopsis root , 2011, Molecular systems biology.

[84]  M. Lateur,et al.  Apple russeting as seen through the RNA-seq lens: strong alterations in the exocarp cell wall , 2015, Plant Molecular Biology.

[85]  C. Lapierre,et al.  Structural Redesigning Arabidopsis Lignins into Alkali-Soluble Lignins through the Expression of p-Coumaroyl-CoA:Monolignol Transferase PMT1 , 2016, Plant Physiology.

[86]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[87]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[88]  Andrew N. Holding,et al.  VULCAN integrates ChIP-seq with patient-derived co-expression networks to identify GRHL2 as a key co-regulator of ERa at enhancers in breast cancer , 2019, Genome Biology.

[89]  Staffan Persson,et al.  Large-Scale Co-Expression Approach to Dissect Secondary Cell Wall Formation Across Plant Species , 2011, Front. Plant Sci..

[90]  Molly Megraw,et al.  Establishment of Expression in the SHORTROOT-SCARECROW Transcriptional Cascade through Opposing Activities of Both Activators and Repressors. , 2016, Developmental cell.

[91]  J. T. Matus,et al.  Arabidopsis paves the way: genomic and network analyses in crops. , 2011, Current opinion in biotechnology.

[92]  Z. Xue,et al.  Overexpression of a type‐I isopentenyl pyrophosphate isomerase of Artemisia annua in the cytosol leads to high arteannuin B production and artemisinin increase , 2017, The Plant journal : for cell and molecular biology.

[93]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[94]  J. Franco-Zorrilla,et al.  Identification of plant transcription factor target sequences. , 2017, Biochimica et biophysica acta. Gene regulatory mechanisms.

[95]  P. O’Neill Medicinal chemistry: A worthy adversary for malaria , 2004, Nature.

[96]  R. Zhong,et al.  Secondary cell walls: biosynthesis, patterned deposition and transcriptional regulation. , 2015, Plant & cell physiology.

[97]  P. Korkuć,et al.  The Identification of Cis-Regulatory Sequence Motifs in Gene Promoters Based on SNP Information. , 2016, Methods in molecular biology.

[98]  Ectopic expression of LBD15 affects lateral branch development and secondary cell wall synthesis in Arabidopsis thaliana , 2014, Plant Growth Regulation.

[99]  Staffan Persson,et al.  The Arabidopsis irregular xylem8 Mutant Is Deficient in Glucuronoxylan and Homogalacturonan, Which Are Essential for Secondary Cell Wall Integrity[W] , 2007, The Plant Cell Online.

[100]  M. Liu,et al.  The Genome of Artemisia annua Provides Insight into the Evolution of Asteraceae Family and Artemisinin Biosynthesis. , 2018, Molecular plant.

[101]  M. Kumar,et al.  MAP20, a Microtubule-Associated Protein in the Secondary Cell Walls of Hybrid Aspen, Is a Target of the Cellulose Synthesis Inhibitor 2,6-Dichlorobenzonitrile1[W][OA] , 2008, Plant Physiology.

[102]  A. Loraine,et al.  RNA-Seq Links the Transcription Factors AINTEGUMENTA and AINTEGUMENTA-LIKE6 to Cell Wall Remodeling and Plant Defense Pathways1[OPEN] , 2016, Plant Physiology.

[103]  K. Tang,et al.  GLANDULAR TRICHOME-SPECIFIC WRKY 1 promotes artemisinin biosynthesis in Artemisia annua. , 2016, The New phytologist.

[104]  J. Pedraz,et al.  Plasmodium falciparum malaria vaccines: current status, pitfalls and future directions , 2012, Expert review of vaccines.

[105]  Tilmann Weber,et al.  The evolution of genome mining in microbes - a review. , 2016, Natural product reports.

[106]  S. Mongrand,et al.  Genome-Wide Annotation of Remorins, a Plant-Specific Protein Family: Evolutionary and Functional Perspectives1[W] , 2007, Plant Physiology.