Non-parametric and semi-parametric methods for parsimonious statistical learning with complex data

Sayan Dasgupta: Non-parametric and semi-parametric methods for parsimonious statistical learning with complex data (Under the direction of Michael R. Kosorok) In clinical research, non-parametric and semi-parametric methods are increasingly gathering importance as statistical tools to infer on accumulated data. They require fewer assumptions and their applicability is much wider than the corresponding parametric methods. Being robust, these methods are seen by some statisticians as leaving less room for improper use and misunderstanding. In this dissertation we study some of these nonparametric and semiparametric methods in statistical learning and their applications to various areas of biomedical research. In the first part of our dissertation, we study the application of temporal process regression in the study of medical adherence. Adherence refers to the act of conforming to the recommendations made by the provider with respect to timing, dosage, and frequency of medication taking. Here we assess the effect of drug adherence in the study of viral resistance to antiviral therapy for chronic Hepatitis C. We use Temporal Process Regression (Fine, Yan, and Kosorok 2004) to model adherence as a longitudinal predictor of SVR. We show that adherence has a significant effect on SVR and this analysis can serve as an archetype for more statistically efficient analyses of medical adherence in studies where the common theme till now has been to report summary statistics. In the second part of the dissertation, we develop an approach for feature elimination in support vector machines, based on recursive elimination of features. We present

[1]  Michael R. Kosorok,et al.  Temporal process regression , 2004 .

[2]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[3]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[4]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[5]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[6]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[7]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[8]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[9]  Susan A. Murphy,et al.  A-Learning for approximate planning , 2004 .

[10]  R. Ramlau,et al.  Phase III trial comparing vinflunine with docetaxel in second-line advanced non-small-cell lung cancer previously treated with platinum-containing chemotherapy. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[11]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[12]  L. Osterberg,et al.  Adherence to medication. , 2005, The New England journal of medicine.

[13]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[14]  R. Hays,et al.  A Comparison Study of Multiple Measures of Adherence to HIV Protease Inhibitors , 2001, Annals of Internal Medicine.

[15]  H. Sung,et al.  Evaluating multiple treatment courses in clinical trials. , 2000, Statistics in medicine.

[16]  L. Radloff The CES-D Scale , 1977 .

[17]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[18]  Hao Helen Zhang Variable selection for support vector machines via smoothing spline anova , 2006 .

[19]  Karen A Robinson,et al.  Cystic fibrosis pulmonary guidelines: chronic medications for maintenance of lung health. , 2007, American journal of respiratory and critical care medicine.

[20]  George Kesidis,et al.  Margin-Maximizing Feature Elimination Methods for Linear and Nonlinear Kernel-Based Discriminant Functions , 2010, IEEE Transactions on Neural Networks.

[21]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[22]  Yufeng Liu,et al.  Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[23]  Abas Md Said,et al.  Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics , 2014, TheScientificWorldJournal.

[24]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[26]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[27]  Susan A. Murphy,et al.  E-cient A-Learning for Dynamic Treatment Regimes: A Handout , 2005 .

[28]  Y. Cheng,et al.  Uncovering Symptom Progression History from Disease Registry Data with Application to Young Cystic Fibrosis Patients , 2010, Biometrics.

[29]  Michael R. Kosorok,et al.  Feature Elimination in Empirical Risk Minimization and Support Vector Machines , 2013 .

[30]  Ree Dawson,et al.  Dynamic treatment regimes: practical design considerations , 2004, Clinical trials.

[31]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[32]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[33]  I. Jolliffe Principal Component Analysis , 2002 .

[34]  Peter F Thall,et al.  Bayesian and frequentist two‐stage treatment strategies based on sequential failure times subject to interval censoring , 2007, Statistics in medicine.

[35]  Donglin Zeng,et al.  New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2015, Journal of the American Statistical Association.

[36]  Bernt Schiele,et al.  Object Recognition Using Multidimensional Receptive Field Histograms , 1996, ECCV.

[37]  S. Murphy,et al.  Methodological Challenges in Constructing Effective Treatment Sequences for Chronic Psychiatric Disorders , 2007, Neuropsychopharmacology.

[38]  Michael R. Kosorok,et al.  Support Vector Regression for Right Censored Data , 2012, 1202.5130.

[39]  S. Murphy,et al.  An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[40]  H. Sung,et al.  Selecting Therapeutic Strategies Based on Efficacy and Death in Multicourse Clinical Trials , 2002 .

[41]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[42]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[43]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[44]  V. Paulsen,et al.  An Introduction to the Theory of Reproducing Kernel Hilbert Spaces , 2016 .

[45]  R. Tibshirani,et al.  Varying‐Coefficient Models , 1993 .

[46]  Michael J. Swain,et al.  Indexing via color histograms , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[47]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[48]  Yufeng Liu,et al.  Support vector machines with adaptive Lq penalty , 2007, Comput. Stat. Data Anal..

[49]  Erica E M Moodie,et al.  Demystifying Optimal Dynamic Treatment Regimes , 2007, Biometrics.

[50]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[51]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[52]  Yunqian Ma,et al.  Practical selection of SVM parameters and noise estimation for SVM regression , 2004, Neural Networks.

[53]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[54]  Bernhard Schölkopf,et al.  Entropy Numbers of Linear Function Classes , 2000, COLT.

[55]  Yaman Aksu A Fast SVM-based Feature Selection Method, Combining MFE (Margin-Maximizing Feature Elimination) and Upper Bound on Misclassification Risk , 2012 .

[56]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[57]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[58]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[59]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[60]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[61]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[62]  Lee-Jen Wei,et al.  Confidence bands for survival curves under the proportional , 1994 .

[63]  H. Conjeevaram,et al.  Peginterferon and ribavirin treatment in African American and Caucasian American patients with hepatitis C genotype 1. , 2006, Gastroenterology.

[64]  Richard Bellman,et al.  Dynamic Programming and the Smoothing Problem , 1956 .

[65]  J. Ramsay,et al.  Some Tools for Functional Data Analysis , 1991 .

[66]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[67]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[68]  Philip W. Lavori,et al.  A design for testing clinical strategies: biased adaptive within‐subject randomization , 2000 .

[69]  C. Golin,et al.  Adherence to PEG/ribavirin treatment for chronic hepatitis C: prevalence, patterns, and predictors of missed doses and nonpersistence , 2013, Journal of viral hepatitis.

[70]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[71]  M. Socinski,et al.  Considerations for second-line therapy of non-small cell lung cancer. , 2008, The oncologist.

[72]  J M Robins,et al.  Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.