Low sensitivity and specificity of current diagnostic methodologies lead to frequent misdiagnosis of Alzheimer’s and other dementia, causing an extra economic and social burden. We aim to compare real word data with the largest public databases, to extract new diagnostic models for an earlier and more accurate diagnosis of cognitive impairment. We analyzed both neuropsychological, neurological, physical assessments and transcriptomic data from biosamples. We used Machine Learning approaches and biostatistical methods to analyze the transcriptome from the large-scale ADNI and AddNeuroMed international projects: we selected some genes as potential transcriptomic biomarkers and highlighted affected cellular processes. Furthermore the analysis, by machine learning, of real-world data provided by European clinical dementia centres, resulted in a small subset of comorbidities able to discriminate diagnostic classes with a good classifier performance.