Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data

MOTIVATION The classification of high-dimensional data is always a challenge to statistical machine learning. We propose a novel method named shallow feature selection that assigns each feature a probability of being selected based on the structure of training data itself. Independent of particular classifiers, the high dimension of biodata can be fleetly reduced to an applicable case for consequential processing. Moreover, to improve both efficiency and performance of classification, these prior probabilities are further used to specify the distributions of top-level hyperparameters in hierarchical models of Bayesian neural network (BNN), as well as the parameters in Gaussian process models. RESULTS Three BNN approaches were derived and then applied to identify ovarian cancer from NCI's high-resolution mass spectrometry data, which yielded an excellent performance in 1000 independent k-fold cross validations (k = 2,...,10). For instance, indices of average sensitivity and specificity of 98.56 and 98.42%, respectively, were achieved in the 2-fold cross validations. Furthermore, only one control and one cancer were misclassified in the leave-one-out cross validation. Some other popular classifiers were also tested for comparison. AVAILABILITY The programs implemented in MatLab, R and Neal's fbm.2004-11-10.

[1]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[3]  E. Petricoin,et al.  SELDI-TOF-based serum proteomic pattern diagnostics for early detection of cancer. , 2004, Current opinion in biotechnology.

[4]  Peter Müller,et al.  Feedforward Neural Networks for Nonparametric Regression , 1998 .

[5]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[8]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[9]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[13]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[14]  E. Petricoin,et al.  High-resolution serum proteomic features for ovarian cancer detection. , 2004, Endocrine-related cancer.

[15]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[16]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[17]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[18]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[19]  Jouko Lampinen,et al.  Bayesian approach for neural networks--review and case studies , 2001, Neural Networks.

[20]  Emanuel F Petricoin,et al.  Mass spectrometry-based diagnostics: the upcoming revolution in disease detection. , 2003, Clinical chemistry.

[21]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[22]  K. Mardia,et al.  Maximum likelihood estimation of models for residual covariance in spatial regression , 1984 .

[23]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[26]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[29]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .

[30]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[31]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[32]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[33]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[34]  H. Toutenburg,et al.  Lehmann, E. L., Nonparametrics: Statistical Methods Based on Ranks, San Francisco. Holden‐Day, Inc., 1975. 480 S., $ 22.95 . , 1977 .

[35]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[36]  M. Ferrari,et al.  Clinical proteomics: Written in blood , 2003, Nature.

[37]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[38]  A. Vlahou,et al.  Diagnosis of Ovarian Cancer Using Decision Tree Classification of Mass Spectral Data , 2003, Journal of biomedicine & biotechnology.

[39]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[40]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[41]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[42]  A. O'Hagan,et al.  Curve Fitting and Optimal Design for Prediction , 1978 .

[43]  Eric R. Ziegel,et al.  Practical Nonparametric and Semiparametric Bayesian Statistics , 1998, Technometrics.

[44]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[45]  Claudio Cobelli,et al.  Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data , 2005, Bioinform..

[46]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.