A computational interactome for prioritizing genes associated with complex agronomic traits in rice (Oryza sativa)

Rice (Oryza sativa) is one of the most important staple foods for more than half of the global population. Many rice traits are quantitative, complex and controlled by multiple interacting genes. Thus, a full understanding of genetic relationships will be critical to systematically identify genes controlling agronomic traits. We developed a genome-wide rice protein-protein interaction network (RicePPINet, http://netbio.sjtu.edu.cn/riceppinet) using machine learning with structural relationship and functional information. RicePPINet contained 708 819 predicted interactions for 16 895 non-transposable element related proteins. The power of the network for discovering novel protein interactions was demonstrated through comparison with other publicly available protein-protein interaction (PPI) prediction methods, and by experimentally determined PPI data sets. Furthermore, global analysis of domain-mediated interactions revealed RicePPINet accurately reflects PPIs at the domain level. Our studies showed the efficiency of the RicePPINet-based method in prioritizing candidate genes involved in complex agronomic traits, such as disease resistance and drought tolerance, was approximately 2-11 times better than random prediction. RicePPINet provides an expanded landscape of computational interactome for the genetic dissection of agronomically important traits in rice.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Qunfeng Dong,et al.  Border Control—A Membrane-Linked Interactome of Arabidopsis , 2014, Science.

[3]  Qifa Zhang,et al.  Genome-wide association studies of 14 agronomic traits in rice landraces , 2010, Nature Genetics.

[4]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[5]  Lei Deng,et al.  A computational interactome and functional annotation for the human proteome , 2016, eLife.

[6]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[7]  Xiaoping Zhou,et al.  The Predicted Arabidopsis Interactome Resource and Network Topology-Based Systems Biology Analyses[W][OA] , 2011, Plant Cell.

[8]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[9]  Sailu Yellaboina,et al.  DOMINE: a comprehensive collection of known and predicted domain-domain interactions , 2010, Nucleic Acids Res..

[10]  Stefan Götz,et al.  Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics , 2007, International journal of plant genomics.

[11]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[12]  Matt Geisler,et al.  A predicted protein interactome for rice , 2012, Rice.

[13]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[14]  Xing-Ming Zhao,et al.  PPIM: A Protein-Protein Interaction Database for Maize1 , 2015, Plant Physiology.

[15]  Gary D. Bader,et al.  The Biomolecular Interaction Network Database in PSI-MI 2.5 , 2011, Database J. Biol. Databases Curation.

[16]  Roberto Refinetti,et al.  Integration of biological clocks and rhythms. , 2012, Comprehensive Physiology.

[17]  D. Schwartz,et al.  Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data , 2013, Rice.

[18]  Zoran Obradovic,et al.  Statistical analysis of interface similarity in crystals of homologous proteins. , 2008, Journal of molecular biology.

[19]  T. Sasaki,et al.  Arabidopsis-rice: will colinearity allow gene prediction across the eudicot-monocot divide? , 1999, Genome research.

[20]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[21]  Julian Mintseris,et al.  A Protein Complex Network of Drosophila melanogaster , 2011, Cell.

[22]  Jonathan D. G. Jones,et al.  Evidence for Network Evolution in an Arabidopsis Interactome Map , 2011, Science.

[23]  Ming Chen,et al.  PRIN: a predicted rice interactome network , 2011, BMC Bioinformatics.

[24]  Ling Li,et al.  Genome-Wide Inference of Protein-Protein Interaction Networks Identifies Crosstalk in Abscisic Acid Signaling1 , 2016, Plant Physiology.

[25]  Arnaud Céol,et al.  3did: a catalog of domain-based interactions of known three-dimensional structure , 2013, Nucleic Acids Res..

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[28]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[29]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[30]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[31]  Yi Wang,et al.  AIM: a comprehensive Arabidopsis interactome module database and related interologs in plants , 2014, Database J. Biol. Databases Curation.

[32]  A. Annibale,et al.  Constrained Markovian Dynamics of Random Graphs , 2009, 0905.4155.

[33]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[34]  Fred P. Davis,et al.  PIBASE: a comprehensive database of structurally defined protein interfaces , 2005, Bioinform..

[35]  Hao Jiang,et al.  A Rice Kinase-Protein Interaction Map1[W][OA] , 2008, Plant Physiology.

[36]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[37]  Guang Li,et al.  AtPID: Arabidopsis thaliana protein interactome database—an integrative platform for plant systems biology , 2007, Nucleic Acids Res..

[38]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[39]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[40]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[41]  Zhikang Li,et al.  Genome-wide temporal-spatial gene expression profiling of drought responsiveness in rice , 2011, BMC Genomics.

[42]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[43]  Huanming Yang,et al.  Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. , 2010, Genome research.

[44]  Narayanan Eswar,et al.  Comparative Protein Structure Modeling 831 831 , 2005 .

[45]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[46]  Xing-Ming Zhao,et al.  PPIM : A protein-protein interaction database for Maize 11 12 , 2015 .

[47]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[48]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[49]  Congmao Wang,et al.  ANAP: An Integrated Knowledge Base for Arabidopsis Protein Interaction Network Analysis1[C][W][OA] , 2012, Plant Physiology.

[50]  David S. Goodsell,et al.  The RCSB Protein Data Bank: views of structural biology for basic and applied research and education , 2014, Nucleic Acids Res..

[51]  Xuehui Huang,et al.  High-throughput genotyping by whole-genome resequencing. , 2009, Genome research.

[52]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[53]  Insuk Lee,et al.  Towards Establishment of a Rice Stress Response Interactome , 2011, PLoS genetics.

[54]  L. Xiong,et al.  Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice , 2014, Nature Communications.