The Aristotle Classifier: Using the Whole Glycomic Profile To Indicate a Disease State.

"The totality is not, as it were, a mere heap, but the whole is something besides the parts."-Aristotle. We built a classifier that uses the totality of the glycomic profile, not restricted to a few glycoforms, to differentiate samples from two different sources. This approach, which relies on using thousands of features, is a radical departure from current strategies, where most of the glycomic profile is ignored in favor of selecting a few features, or even a single feature, meant to capture the differences in sample types. The classifier can be used to differentiate the source of the material; applicable sources may be different species of animals, different protein production methods, or, most importantly, different biological states (disease vs healthy). The classifier can be used on glycomic data in any form, including derivatized monosaccharides, intact glycans, or glycopeptides. It takes advantage of the fact that changing the source material can cause a change in the glycomic profile in many subtle ways: some glycoforms can be upregulated, some downregulated, some may appear unchanged, yet their proportion-with respect to other forms present-can be altered to a detectable degree. By classifying samples using the entirety of their glycan abundances, along with the glycans' relative proportions to each other, the "Aristotle Classifier" is more effective at capturing the underlying trends than standard classification procedures used in glycomics, including PCA (principal components analysis). It also outperforms workflows where a single, representative glycomic-based biomarker is used to classify samples. We describe the Aristotle Classifier and provide several examples of its utility for biomarker studies and other classification problems using glycomic data from several sources.

[1]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[2]  E. Go,et al.  Label-free quantitation: A new glycoproteomics approach , 2009, Journal of the American Society for Mass Spectrometry.

[3]  Lei Xie,et al.  Providing data science support for systems pharmacology and its implications to drug discovery , 2016, Expert opinion on drug discovery.

[4]  I. Rudan,et al.  The N-glycosylation of immunoglobulin G as a novel biomarker of Parkinson's disease , 2017, Glycobiology.

[5]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[6]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[7]  P. Rudd,et al.  Validation of an automated ultraperformance liquid chromatography IgG N-glycan analytical method applicable to classical galactosaemia , 2018, Annals of clinical biochemistry.

[8]  Emily L. Kang,et al.  Computational and statistical analysis of metabolomics data , 2015, Metabolomics.

[9]  D. Geman,et al.  An argument for mechanism-based statistical inference in cancer , 2014, Human Genetics.

[10]  Michelle A. Anderson,et al.  Large-scale identification of core-fucosylated glycopeptide sites in pancreatic cancer serum using mass spectrometry. , 2015, Journal of proteome research.

[11]  Bengt Winblad,et al.  The role of protein glycosylation in Alzheimer disease , 2014, The FEBS journal.

[12]  Xifeng Wu,et al.  Stage Dependence, Cell-Origin Independence, and Prognostic Capacity of Serum Glycan Fucosylation, β1-4 Branching, β1-6 Branching, and α2-6 Sialylation in Cancer. , 2018, Journal of proteome research.

[13]  E. Go,et al.  Maximizing coverage of glycosylation heterogeneity in MALDI-MS analysis of glycoproteins with up to 27 glycosylation sites. , 2008, Analytical chemistry.

[14]  E. Go,et al.  Recombinant Human Lysyl Oxidase-like 2 Secreted from Human Embryonic Kidney Cells Displays Complex and Acidic Glycans at All Three N-Linked Glycosylation Sites. , 2018, Journal of Proteome Research.

[15]  Xiaomeng Su,et al.  GlycoPep MassList: software to generate massive inclusion lists for glycopeptide analyses , 2016, Analytical and Bioanalytical Chemistry.

[16]  J. Kim,et al.  Designation of fingerprint glycopeptides for targeted glycoproteomic analysis of serum haptoglobin: insights into gastric cancer biomarker discovery , 2018, Analytical and Bioanalytical Chemistry.

[17]  Bifeng Liu,et al.  Characterization of IgG N-glycome profile in colorectal cancer progression by MALDI-TOF-MS. , 2018, Journal of proteomics.

[18]  R. Zubarev,et al.  IgG Fc galactosylation predicts response to methotrexate in early rheumatoid arthritis , 2017, Arthritis Research & Therapy.