Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes

Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures—specifically dimension reduction (DR), coupled with clustering—provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods: ‘Minimum Curvilinearity’ (MC) is a principle that—for small datasets—suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal. Availability: https://sites.google.com/site/carlovittoriocannistraci/home Contact: kalokagathos.agon@gmail.com; massimo.alessio@hsr.it Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Valentina Gianotti,et al.  A new integrated statistical approach to the diagnostic use of two‐dimensional maps , 2003, Electrophoresis.

[2]  Antonio Conti,et al.  Pigment epithelium‐derived factor is differentially expressed in peripheral neuropathies , 2005, Proteomics.

[3]  T. Jensen,et al.  Mechanisms of Disease: mechanism-based classification of neuropathic pain—a critical analysis , 2006, Nature Clinical Practice Neurology.

[4]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[5]  Andreas D. Lattner,et al.  A Combination of Machine Learning and Image Processing Technologies for the Classification of Image Regions , 2003, Adaptive Multimedia Retrieval.

[6]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[7]  Ariel S. Schwartz,et al.  An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man , 2010, Cell.

[8]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[9]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[10]  K. Dorshkind Multilineage development from adult bone marrow cells , 2002, Nature Immunology.

[11]  Sergio Cerutti,et al.  An Integrated Strategy in Two-Dimensional Electrophoresis Analysis Able to Identify Discriminants Between Different Clinical Conditions , 2008, Experimental biology and medicine.

[12]  Benno Stein,et al.  On Cluster Validity and the Information Need of Users , 2003 .

[13]  Gunnar Rätsch,et al.  Advanced Lectures on Machine Learning , 2004, Lecture Notes in Computer Science.

[14]  Michele Leone,et al.  Clustering by soft-constraint affinity propagation: applications to gene-expression data , 2007, Bioinform..

[15]  Francesca Martella,et al.  Classification of microarray data with factor mixture models , 2006, Bioinform..

[16]  Dmitrij Frishman,et al.  Pitfalls of supervised feature selection , 2009, Bioinform..

[17]  Ralf Baron,et al.  Mechanisms of Disease: neuropathic pain—a clinical perspective , 2006, Nature Clinical Practice Neurology.

[18]  Emilio Marengo,et al.  Multivariate statistical tools applied to the characterization of the proteomic profiles of two human lymphoma cell lines by two‐dimensional gel electrophoresis , 2006, Electrophoresis.

[19]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[20]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[21]  Francesca Antonucci,et al.  Numerical approaches for quantitative analysis of two‐dimensional maps: A review of commercial software and home‐made systems , 2005, Proteomics.

[22]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[23]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[24]  T. Gordh,et al.  Peripheral neuropathic pain—a multidimensional burden for patients , 2001, European journal of pain.

[25]  Marián Boguñá,et al.  Navigability of Complex Networks , 2007, ArXiv.

[26]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[27]  Marcel J. T. Reinders,et al.  A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets , 2006, BMC Bioinformatics.

[28]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[29]  Franco M Montevecchi,et al.  Median‐modified Wiener filter provides efficient denoising, preserving spot edge and morphology in 2‐DE image processing , 2009, Proteomics.

[30]  Stefano Cappa,et al.  Differential expression of ceruloplasmin isoforms in the cerebrospinal fluid of amyotrophic lateral sclerosis patients , 2008, Proteomics. Clinical applications.

[31]  Emilio Marengo,et al.  A new method of comparing 2D-PAGE maps based on the computation of Zernike moments and multivariate statistical tools , 2008, Analytical and bioanalytical chemistry.

[32]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[33]  Joshua B. Tenenbaum,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.