Sequential Sampling for Optimal Bayesian Classification of Sequencing Count Data

High throughput technologies have become the practice of choice for comparative studies in biomedical applications. Limited number of sample points due to sequencing cost or access to organisms of interest necessitates the development of efficient sample collections to maximize the power of downstream statistical analyses. We propose a method for sequentially choosing training samples under the Optimal Bayesian Classification framework. Specifically designed for RNA sequencing count data, the proposed method takes advantage of efficient Gibbs sampling procedure with closed-form updates. Our results shows enhanced classification accuracy, when compared to random sampling.

[1]  Edward R. Dougherty,et al.  Optimal classifiers with minimum expected error within a Bayesian framework - Part II: Properties and performance analysis , 2013, Pattern Recognit..

[2]  Nasser M. Nasrabadi,et al.  Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[3]  Byung-Jun Yoon,et al.  Efficient experimental design for uncertainty reduction in gene regulatory networks , 2015, BMC Bioinformatics.

[4]  Guorong Wu,et al.  Dynamic fMRI networks predict success in a behavioral weight loss program among older adults , 2018, NeuroImage.

[5]  Edward R. Dougherty,et al.  Optimal Experimental Design for Gene Regulatory Networks in the Presence of Uncertainty , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Xiaoning Qian,et al.  Bayesian module identification from multiple noisy networks , 2016, EURASIP J. Bioinform. Syst. Biol..

[7]  Edward R. Dougherty,et al.  Experimental Design via Generalized Mean Objective Cost of Uncertainty , 2018, IEEE Access.

[8]  Emre Arslan,et al.  Bayesian Classification of Genomic Big Data , 2018 .

[9]  Tao Hu,et al.  A length bias corrected likelihood ratio test for the detection of differentially expressed pathways in RNA-Seq data , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[10]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[11]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[12]  R. Doerge,et al.  Statistical Applications in Genetics and Molecular Biology A Two-Stage Poisson Model for Testing RNA-Seq Data , 2011 .

[13]  Ulisses Braga-Neto,et al.  ParticleFilters for Partially-ObservedBooleanDynamical Systems , 2017 .

[14]  Edward R. Dougherty,et al.  Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors , 2017, BMC Bioinformatics.

[15]  Paul J. Laurienti,et al.  Tensor-based vs. matrix-based rank reduction in dynamic brain connectivity , 2018, Medical Imaging.

[16]  Ahmad Samiei,et al.  Identification, analysis, and interpretation of a human serum metabolomics causal network in an observational study , 2016, J. Biomed. Informatics.

[17]  Miad Faezipour,et al.  Hyperbolic Modeling of Subthalamic Nucleus Cells to Investigate the Effect of Dopamine Depletion , 2017, Comput. Intell. Neurosci..

[18]  Edward R. Dougherty,et al.  Analytical study of performance of linear discriminant analysis in stochastic settings , 2013, Pattern Recognit..

[19]  A. F. Whiting Identification , 1960 .

[20]  Edward R. Dougherty,et al.  Constructing Pathway-Based Priors within a Gaussian Mixture Model for Bayesian Regression and Classification , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Nasser M. Nasrabadi,et al.  Generalized Bilinear Deep Convolutional Neural Networks for Multimodal Biometric Identification , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[22]  Ahmad Samiei,et al.  Generating a robust statistical causal structure over 13 cardiovascular disease risk factors using genomics data , 2016, J. Biomed. Informatics.

[23]  Edward R. Dougherty,et al.  Optimal classifiers with minimum expected error within a Bayesian framework - Part I: Discrete and Gaussian models , 2013, Pattern Recognit..

[24]  Quinn Jones,et al.  Few-Shot Adversarial Domain Adaptation , 2017, NIPS.

[25]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[26]  Lawrence Carin,et al.  Negative Binomial Process Count and Mixture Modeling. , 2012, IEEE transactions on pattern analysis and machine intelligence.

[27]  Eric Boerwinkle,et al.  Conceptual Aspects of Causal Networks in an Applied Context , 2016 .

[28]  Siamak Zamani Dadaneh,et al.  BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data , 2016, 1608.03991.

[29]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[30]  Edward R. Dougherty,et al.  MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification , 2014, BMC Bioinformatics.

[31]  Miad Faezipour,et al.  Towards frequency adaptation for delayed feedback deep brain stimulations , 2018, Neural regeneration research.

[32]  Xiaoning Qian,et al.  Bayesian negative binomial regression for differential expression with confounding factors , 2018, Bioinform..

[33]  Gholam-Ali Hossein-Zadeh,et al.  Discriminating between brain rest and attention states using fMRI connectivity graphs and subtree SVM , 2012, Medical Imaging.

[34]  Nasser M. Nasrabadi,et al.  Deep Cross Polarimetric Thermal-to-Visible Face Recognition , 2018, 2018 International Conference on Biometrics (ICB).

[35]  Xiaoning Qian,et al.  Autonomous efficient experiment design for materials discovery with Bayesian model averaging , 2018, Physical Review Materials.

[36]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Edward R. Dougherty,et al.  Discrete optimal Bayesian classification with error-conditioned sequential sampling , 2015, Pattern Recognit..

[38]  Miad Faezipour,et al.  Computational Stimulation of the Basal Ganglia Neurons with Cost Effective Delayed Gaussian Waveforms , 2017, Front. Comput. Neurosci..