Mutual Information Optimization for Mass Spectra Data Alignment

"Signal” alignments play critical roles in many clinical setting. This is the case of mass spectrometry (MS) data, an important component of many types of proteomic analysis. A central problem occurs when one needs to integrate (MS) data produced by different sources, e.g., different equipment and/or laboratories. In these cases, some form of "data integration” or "data fusion” may be necessary in order to discard some source-specific aspects and improve the ability to perform a classification task such as inferring the "disease classes” of patients. The need for new high-performance data alignments methods is therefore particularly important in these contexts. In this paper, we propose an approach based both on an information theory perspective, generally used in a feature construction problem, and the application of a mathematical programming task (i.e., the weighted bipartite matching problem). We present the results of a competitive analysis of our method against other approaches. The analysis was conducted on data from plasma/ethylenediaminetetraacetic acid of "control” and Alzheimer patients collected from three different hospitals. The results point to a significant performance advantage of our method with respect to the competing ones tested.

[1]  Giancarlo Mauri,et al.  A Mutual Information Approach to Data Integration for Alzheimer's Disease Patients , 2009, AIME.

[2]  Brian Leyland-Jones,et al.  Proteomics: new technologies and clinical applications. , 2008, European journal of cancer.

[3]  D. Praticò,et al.  Evidence of Oxidative Stress in Alzheimer's Disease Brain and Antioxidant Therapy , 2008, Annals of the New York Academy of Sciences.

[4]  Scott A. Small,et al.  Linking Aβ and Tau in Late-Onset Alzheimer's Disease: A Dual Pathway Hypothesis , 2008, Neuron.

[5]  Piero Antuono,et al.  Alzheimer's disease and vascular dementia in developing countries: prevalence, management, and risk factors , 2008, The Lancet Neurology.

[6]  Stefano Ferrero,et al.  Human urine biomarkers of renal cell carcinoma evaluated by ClinProt , 2008, Proteomics. Clinical applications.

[7]  L. Cazares,et al.  Differential Capture of Serum Proteins for Expression Profiling and Biomarker Discovery in Pre‐ and Posttreatment Head and Neck Cancer Samples , 2008, The Laryngoscope.

[8]  R. Tibshirani,et al.  Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins , 2007, Nature Medicine.

[9]  A. Dominiczak,et al.  Body fluid proteomics for biomarker discovery: lessons from the past hold the key to success in the future. , 2007, Journal of proteome research.

[10]  E. Reiman,et al.  Alzheimer's disease a century later. , 2006, The Journal of clinical psychiatry.

[11]  Tony Wyss-Coray,et al.  Inflammation in Alzheimer disease: driving force, bystander or beneficial response? , 2006, Nature Medicine.

[12]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[13]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[14]  Jesse Davis,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[15]  Cornelis J H van de Velde,et al.  Detection of colorectal cancer using MALDI-TOF serum protein profiling. , 2006, European journal of cancer.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Robert A Gardiner,et al.  Use of multiple biomarkers for a molecular diagnosis of prostate cancer , 2005, International journal of cancer.

[18]  Melanie Hilario,et al.  Machine learning approaches to lung cancer prediction from mass spectra , 2003, Proteomics.

[19]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[20]  Tong Zhang An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[21]  William W. Hsieh,et al.  Nonlinear canonical correlation analysis by neural networks , 2000, Neural Networks.

[22]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[23]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1995, Proceedings of IEEE International Conference on Computer Vision.

[24]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[25]  Peter Riederer,et al.  Excitotoxicity and new antiglutamatergic strategies in Parkinson's disease and Alzheimer's disease. , 2007, Parkinsonism & related disorders.

[26]  Riccardo Bellazzi,et al.  A New Approach for the Analysis of Mass Spectrometry Data for Biomarker Discovery , 2006, AMIA.

[27]  Sriram V. Pemmaraju,et al.  Computational Discrete Mathematics: Algorithmic Graph Theory , 2003 .