Targeted Local Support Vector Machine for Age-Dependent Classification

We develop methods to accurately predict whether presymptomatic individuals are at risk of a disease based on their various marker profiles, which offers an opportunity for early intervention well before definitive clinical diagnosis. For many diseases, existing clinical literature may suggest the risk of disease varies with some markers of biological and etiological importance, for example, age. To identify effective prediction rules using nonparametric decision functions, standard statistical learning approaches treat markers with clear biological importance (e.g., age) and other markers without prior knowledge on disease etiology interchangeably as input variables. Therefore, these approaches may be inadequate in singling out and preserving the effects from the biologically important variables, especially in the presence of potential noise markers. Using age as an example of a salient marker to receive special care in the analysis, we propose a local smoothing large margin classifier implemented with support vector machine (SVM) to construct effective age-dependent classification rules. The method adaptively adjusts age effect and separately tunes age and other markers to achieve optimal performance. We derive the asymptotic risk bound of the local smoothing SVM and perform extensive simulation studies to compare with standard approaches. We apply the proposed method to two studies of premanifest Huntington’s disease (HD) subjects and controls to construct age-sensitive predictive scores for the risk of HD and risk of receiving HD diagnosis during the study period. Supplementary materials for this article are available online.

[1]  Guodong Guo,et al.  Support Vector Machines Applications , 2014 .

[2]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Jane S. Paulsen,et al.  Detection of Huntington’s disease decades before diagnosis: the Predict-HD study , 2007, Journal of Neurology, Neurosurgery, and Psychiatry.

[5]  T. Cai,et al.  Combining Predictors for Classification Using the Area under the Receiver Operating Characteristic Curve , 2006, Biometrics.

[6]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[7]  Philip H. S. Torr,et al.  Locally Linear Support Vector Machines , 2011, ICML.

[8]  Runze Li,et al.  Local Rank Inference for Varying Coefficient Models , 2009, Journal of the American Statistical Association.

[9]  Personal factors associated with reported benefits of Huntington disease family history or genetic testing. , 2010, Genetic testing and molecular biomarkers.

[10]  A. Mechelli,et al.  Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: A critical review , 2012, Neuroscience & Biobehavioral Reviews.

[11]  T. Foroud,et al.  Differences in duration of Huntington’s disease based on age at onset , 1999, Journal of neurology, neurosurgery, and psychiatry.

[12]  Jianqing Fan,et al.  Efficient Estimation and Inferences for Varying-Coefficient Models , 2000 .

[13]  Jane S. Paulsen,et al.  Predictors of diagnosis in Huntington disease , 2007, Neurology.

[14]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[15]  Philip H. S. Torr,et al.  Learning Anchor Planes for Classification , 2011, NIPS.

[16]  P. Celsis,et al.  Age-related cognitive decline, mild cognitive impairment or preclinical Alzheimer's disease? , 2000, Annals of medicine.

[17]  Wei Pan,et al.  On Efficient Large Margin Semisupervised Learning: Method and Theory , 2009, J. Mach. Learn. Res..

[18]  W. Wong,et al.  On ψ-Learning , 2003 .

[19]  Jane S. Paulsen,et al.  Unified Huntington's disease rating scale: Reliability and consistency , 1996, Movement disorders : official journal of the Movement Disorder Society.

[20]  M. Pepe,et al.  Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. , 2004, American journal of epidemiology.

[21]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[22]  Yufeng Liu,et al.  Functional Robust Support Vector Machines for Sparse and Irregular Longitudinal Data , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[23]  Gary Longton,et al.  Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic or Prognostic Marker , 2004 .

[24]  Jane S. Paulsen,et al.  Preparing for preventive clinical trials: the Predict-HD study. , 2006, Archives of neurology.

[25]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[26]  J. Ware The limitations of risk factors as prognostic tools. , 2006, The New England journal of medicine.

[27]  K. Boone,et al.  Handbook of Normative Data for Neuropsychological Assessment , 1999 .

[28]  E. Ray Dorsey,et al.  Characterization of a Large Group of Individuals with Huntington Disease and Their Relatives Enrolled in the COHORT Study , 2012, PloS one.

[29]  G. Wahba Spline models for observational data , 1990 .

[30]  Jane S. Paulsen,et al.  Indexing disease progression at study entry with individuals at‐risk for Huntington disease , 2011, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[31]  Joseph T. Glessner,et al.  From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes , 2009, PLoS genetics.

[32]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[33]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[34]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..