Handling data imperfection - False data inputs in applications for Alzheimer's patients

Abstract Handling data imperfection is a crucial issue in many application domains. This is particularly true when handling imperfect data inputs in applications for Alzheimer’s patients. In this paper we first propose a typology of imperfection for data entered by Alzheimer’s patients or their caregivers in the context of these applications (mainly due to the memory discordance caused by the disease). This topology includes nine direct and three indirect imperfection types. The direct ones are deduced from the data inputs e.g. uncertainty and uselessness. The indirect imperfection types are deduced from the direct ones, e.g. the redundancy. We then propose an approach, called DBE_ALZ, that handles false data entry by estimating the believability of each data input. Based on the proposed typology, the falsity of these data is related to five imperfection types: uncertainty, confusion, typing error, wrong knowledge and inconsistency. DBE_ALZ includes a believability model that defines a set of dimensions and sub-dimensions allowing a qualitative estimation of the believability of a given data input. It is estimated based on its reasonableness and the reliability of its author. Compared to related work, the data input reasonableness is measured not only based on common-sense standard, but also based on a set of personalized assertions. The reliability of the patient is estimated based on the progression of the disease and the state of his memory at the moment of entry. However, the reliability of the caregiver is estimated based on his age and his knowledge about the data input’s field. Based on the believability model, we estimate quantitatively the believability of the data input by defining a set of metrics associated to the proposed dimensions and sub-dimensions. The measurement methods rely on probability and fuzzy set theories to reason about uncertain and imprecise knowledge (Bayesian networks and Mamdani fuzzy inference systems). Three languages are supported: English, French and Arabic. Based on the generated believability degrees, a set of decisive actions are proposed to guarantee the quality of the data inputs e.g., inferring or not based on a given data. We illustrate the usefulness of our approach in the context of the Captain Memo memory prosthesis. Finally, we discuss the encouraging results derived from the evaluation step.

[1]  Jesús Alcalá-Fdez,et al.  jFuzzyLogic: a robust and flexible Fuzzy-Logic inference system language implementation , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[2]  Carlo Batini,et al.  Data Quality , 2008, Encyclopedia of GIS.

[3]  Barry Smyth,et al.  Information quality dimensions for the social web , 2012, MEDES.

[4]  Melissa Barkat-Defradas,et al.  Les troubles du langage dans la maladie d'Alzheimer , 2008 .

[5]  Cyril de Runz Imperfection, temps et espace : modélisation, analyse et visualisation dans un SIG archéologique. (Imperfection, time and space: modeling, analysis and visualization in an archaeological GIS) , 2008 .

[6]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[7]  Elisabeth Métais,et al.  MEMO_Calendring: A smart reminder for Alzheimer's disease patients , 2017, 2017 International Conference on Smart, Monitored and Controlled Cities (SM2C).

[8]  B. Bouchon-Meunier,et al.  La logique floue et ses applications , 1995 .

[9]  Pradeep Kumar Ray,et al.  Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature , 2013, Int. J. Medical Informatics.

[10]  Adir Even,et al.  Data quality assessment in context: A cognitive perspective , 2009, Decis. Support Syst..

[11]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[12]  Jean-François Casta,et al.  Incertitude et comptabilité , 2009 .

[13]  Faïez Gargouri,et al.  A Typology of Temporal Data Imperfection , 2019, KEOD.

[14]  S. Folstein,et al.  "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. , 1975, Journal of psychiatric research.

[15]  Richard Y. Wang,et al.  Journey to Data Quality , 2006 .

[16]  Stuart E. Madnick,et al.  A Cyclic-Hierarchical Method for Database Data-Quality Evaluation and Improvement , 2014 .

[17]  M. Farage,et al.  Design Principles to Accommodate Older Adults , 2012, Global journal of health science.

[18]  Laure Berti-Équille Un état de l'art sur la qualité des données , 2004, Ingénierie des Systèmes d Inf..

[19]  K. Saikaew,et al.  Features for Measuring Credibility on Facebook Information , 2015 .

[20]  Miriam J. Metzger,et al.  Credibility and trust of information in online environments: The use of cognitive heuristics , 2013 .

[21]  Hatem Ben Sta,et al.  Quality and the efficiency of data in "Smart-Cities" , 2017, Future Gener. Comput. Syst..

[22]  Carlisle George,et al.  Discovering Most Important Data Quality Dimensions Using Latent Semantic Analysis , 2018, 2018 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD).

[23]  Besiki Stvilia,et al.  Prioritization of data quality dimensions and skills requirements in genome annotation work , 2012, J. Assoc. Inf. Sci. Technol..

[24]  Christian Reuter,et al.  Rumors, Fake News and Social Bots in Conflicts and Emergencies: Towards a Model for Believability in Social Media , 2017, ISCRAM.

[25]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[26]  C. de Runz,et al.  Prise en compte de l’imperfection des connaissances depuis la saisie des données jusqu’à la restitution 3D , 2012 .

[27]  Lotfi A. Zadeh,et al.  The concept of a linguistic variable and its application to approximate reasoning-III , 1975, Inf. Sci..

[28]  A. Rodríguez‐Pose Economic Geographers and the Limelight: Institutions and Policy in the World Development Report 2009 , 2010 .

[29]  Marinette Savonnet,et al.  Raisonner en logique modale sur l'incertitude liée aux données - Application en archéologie , 2016, Rev. Int. Géomatique.

[30]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[31]  Philippe Smets,et al.  Imperfect Information: Imprecision and Uncertainty , 1996, Uncertainty Management in Information Systems.

[32]  Isabelle Comyn-Wattiau,et al.  Évaluation de la qualité des systèmes multisources. Une approche par les patterns , 2008 .

[33]  Wafa Wali,et al.  A Multilingual Semantic Similarity-Based Approach for Question-Answering Systems , 2019, KSEM.

[34]  Traci Hong,et al.  Contributing Factors to the Use of Health-Related Websites , 2006, Journal of health communication.

[35]  Donna B. Stoddard,et al.  Quality of Social Media Data and Implications of Social Media for Data Quality , 2012, MIT International Conference on Information Quality.

[36]  Matthias Jarke,et al.  Design and Analysis of Quality Information for Data Warehouses , 1998, ER.

[37]  Joseph Moses Juran,et al.  Quality-control handbook , 1951 .

[38]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[39]  Ana Paiva,et al.  Metrics for Character Believability in Interactive Narrative , 2013, ICIDS.

[41]  Felix Naumann,et al.  Assessment Methods for Information Quality Criteria , 2000, IQ.

[42]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[43]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[44]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[45]  Nahum D. Gershon Visualization of an Imperfect World , 1998, IEEE Computer Graphics and Applications.

[46]  Paule-Annick Davoine,et al.  Methodological proposals to handle imperfect spatial and temporal information in the context of natural hazard studies , 2013, Rev. Int. Géomatique.

[47]  Abdelmajid Ben Hamadou,et al.  ISO standard modeling of a large Arabic dictionary , 2015, Natural Language Engineering.

[48]  Roberto Baldoni,et al.  The architecture: a platform for exchanging and improving data quality in cooperative information systems , 2004, Inf. Syst..