Current progress of high-throughput microRNA differential expression analysis and random forest gene selection for model and non-model systems: an R implementation.

MicroRNAs are short non-coding RNA transcripts that act as master cellular egulators with roles in orchestrating virtually all biological functions. The recent affordability and widespread use of high-throughput microRNA profiling technologies has grown along with the advancement of bioinformatics tools available for analysis of the mounting data flow. While there are many computational resources available for the management of data from genome sequenced animals, researchers are often faced with the challenge of identifying the biological implications of the daunting amount of data generated from these high-throughput technologies. In this article, we review the current state of highthroughput microRNA expression profiling platforms, data analysis processes, and computational tools in the context of comparative molecular physiology. We also present RBioMIR and RBioFS, our R package implementations for differential expression analysis and random forest-based gene selection. Detailed installation guides are available at kenstoreylab.com.

[1]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[2]  Jing Zhang,et al.  RBioplot: an easy-to-use R pipeline for automated statistical analysis and data visualization in molecular biology and biochemistry , 2016, PeerJ.

[3]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  M. Joglekar,et al.  A comparative analysis of high-throughput platforms for validation of a circulating microRNA signature in diabetic retinopathy , 2015, Scientific Reports.

[5]  Miron B. Kursa,et al.  Robustness of Random Forest-based gene selection methods , 2013, BMC Bioinformatics.

[6]  K. Etebari,et al.  Accuracy of MicroRNA Discovery Pipelines in Non-Model Organisms Using Closely Related Species Genomes , 2014, PloS one.

[7]  Kyle K Biggar,et al.  High-throughput amplification of mature microRNAs in uncharacterized animal models using polyadenylated RNA and stem-loop reverse transcription polymerase chain reaction. , 2014, Analytical biochemistry.

[8]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[9]  Margaret S. Ebert,et al.  Roles for MicroRNAs in Conferring Robustness to Biological Processes , 2012, Cell.

[10]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[11]  Kyle K. Biggar,et al.  A framework for improving microRNA prediction in non-human genomes , 2015, Nucleic acids research.

[12]  Jeffrey G. Reid,et al.  Expression profiling of microRNAs by deep sequencing , 2009, Briefings Bioinform..

[13]  Sai Lakshmi Subramanian,et al.  piRNABank: a web resource on classified and clustered Piwi-interacting RNAs , 2007, Nucleic Acids Res..

[14]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[15]  Hanane Hadj-Moussa,et al.  The hibernating South American marsupial, Dromiciops gliroides, displays torpor-sensitive microRNA expression patterns , 2016, Scientific Reports.

[16]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[17]  H. Ebhardt,et al.  Correlations of microRNA:microRNA expression patterns reveal insights into microRNA clusters and global microRNA expression patterns. , 2016, Molecular bioSystems.

[18]  B. Langmead,et al.  Aligning Short Sequencing Reads with Bowtie , 2010, Current protocols in bioinformatics.

[19]  Most Mauluda Akhtar,et al.  Bioinformatic tools for microRNA dissection , 2015, Nucleic acids research.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[22]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[23]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[24]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[25]  Isabella Castiglioni,et al.  MicroRNAs: New Biomarkers for Diagnosis, Prognosis, Therapy Prediction and Therapeutic Tools for Breast Cancer , 2015, Theranostics.

[26]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[27]  K. Storey Regulation of hypometabolism: insights into epigenetic controls , 2015, Journal of Experimental Biology.

[28]  Artemis G. Hatzigeorgiou,et al.  DIANA-miRPath v3.0: deciphering microRNA function with experimental support , 2015, Nucleic Acids Res..

[29]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[30]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[31]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[32]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[33]  R. Shankar,et al.  miReader: Discovering Novel miRNAs in Species without Sequenced Genome , 2013, PloS one.

[34]  Carolin Strobl,et al.  The behaviour of random forest permutation-based variable importance measures under predictor correlation , 2010, BMC Bioinformatics.

[35]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[36]  Paulo P. Amaral,et al.  Non-coding RNAs in homeostasis, disease and stress responses: an evolutionary perspective. , 2013, Briefings in functional genomics.

[37]  James R. Brown,et al.  A computational view of microRNAs and their targets. , 2005, Drug discovery today.

[38]  Bertrand Michel,et al.  Correlation and variable importance in random forests , 2013, Statistics and Computing.

[39]  James K. Ellis,et al.  Systematic integration of molecular profiles identifies miR-22 as a regulator of lipid and folate metabolism in breast cancer cells , 2016, Oncogene.

[40]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[41]  Zissimos Mourelatos,et al.  Microarray-based, high-throughput gene expression profiling of microRNAs , 2004, Nature Methods.

[42]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[43]  M. Tewari,et al.  MicroRNA profiling: approaches and considerations , 2012, Nature Reviews Genetics.

[44]  Amy Y. M. Au,et al.  RBM3 regulates temperature sensitive miR-142–5p and miR-143 (thermomiRs), which target immune genes and control fever , 2016, Nucleic acids research.