论文信息 - Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach

Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach

Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes.

Kyoung Tak Cho | Carson M. Andorf | T. Sen

[1] David R. Kelley,et al. Effective gene expression prediction from sequence by integrating long-range interactions , 2021, Nature Methods.

[2] V. Verendel,et al. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure , 2020, Nature communications.

[3] Md Nafis Ul Alam,et al. Short k-mer abundance profiles yield robust machine learning features and accurate classifiers for RNA viruses , 2020, bioRxiv.

[4] A. N’Diaye,et al. Machine learning analyses of methylation profiles uncovers tissue‐specific gene expression patterns in wheat , 2020, The plant genome.

[5] E. Buckler,et al. Deep learning for plant genomics and crop improvement. , 2020, Current opinion in plant biology.

[6] Jesse R. Walsh,et al. Tissue-specific gene expression and protein abundance patterns are associated with fractionation bias in maize , 2020, BMC Plant Biology.

[7] Dick de Ridder,et al. Designing Eukaryotic Gene Expression Regulation Using Machine Learning. , 2020, Trends in biotechnology.

[8] Ryan C. Sartor,et al. Identification of the expressome by machine learning on omics data , 2019, Proceedings of the National Academy of Sciences.

[9] Marcel H. Schulz,et al. Integrative prediction of gene expression with chromatin accessibility and conformation data , 2019, Epigenetics & Chromatin.

[10] Fabian J Theis,et al. Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[11] J. Nielsen,et al. Predictive models of eukaryotic transcriptional regulation reveals changes in transcription factor roles and promoter usage between metabolic conditions , 2019, Nucleic acids research.