Sharing Practices for Datasets Related to Accessibility and Aging

Datasets sourced from people with disabilities and older adults play an important role in innovation, benchmarking, and mitigating bias for both assistive and inclusive AI-infused applications. However, they are scarce. We conduct a systematic review of 137 accessibility datasets manually located across different disciplines over the last 35 years. Our analysis highlights how researchers navigate tensions between benefits and risks in data collection and sharing. We uncover patterns in data collection purpose, terminology, sample size, data types, and data sharing practices across communities of focus. We conclude by critically reflecting on challenges and opportunities related to locating and sharing accessibility datasets calling for technical, legal, and institutional privacy frameworks that are more attuned to concerns from these communities.

[1]  Pierluigi Carcagnì,et al.  Automatic Emotion Recognition in Robot-Children Interaction for ASD Treatment , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[2]  Lijun Feng,et al.  Automatic readability assessment for people with intellectual disabilities , 2009, ASAC.

[3]  Irina P. Temnikova,et al.  Accessible Texts for Autism: An Eye-Tracking Study , 2015, ASSETS.

[4]  Ricardo Baeza-Yates,et al.  DysList: An Annotated Resource of Dyslexic Errors , 2014, LREC.

[5]  W. Adams,et al.  High-accuracy detection of early Parkinson's Disease using multiple characteristics of finger movement while typing , 2017, PloS one.

[6]  Christopher Cieri,et al.  Talkbank: Building an Open Unified Multimodal Database of Communicative Interaction , 2004, LREC.

[7]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[8]  Anind K. Dey,et al.  Providing good memory cues for people with episodic memory impairment , 2007, Assets '07.

[9]  Joanna McGrenere,et al.  The design and field evaluation of PhotoTalk: a digital image communication application for people , 2007, Assets '07.

[10]  S. Friend,et al.  The mPower study, Parkinson disease mobile data collected using ResearchKit , 2016, Scientific Data.

[11]  Mike Thelwall,et al.  Identifying Data Sharing and Reuse with Scholix: Potentials and Limitations , 2020, Patterns.

[12]  Hernisa Kacorri,et al.  Teachable machines for accessibility , 2017, ASAC.

[13]  Yeliz Yesilada,et al.  Autism detection based on eye movement sequences on the web: a scanpath trend analysis approach , 2020, W4A.

[14]  D. Fick,et al.  When It Comes to Older Adults, Language Matters. , 2017, Journal of gerontological nursing.

[15]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[16]  M. Mills,et al.  Disability, Bias, and AI , 2019 .

[17]  Frédo Durand,et al.  A Video-Based Method for Automatically Rating Ataxia , 2017, MLHC.

[18]  Markus K. Labude,et al.  An Ethics Framework for Big Data in Health and Research , 2019, Asian Bioethics Review.

[19]  R. Dobson,et al.  Characterisation of mental health conditions in social media using Informed Deep Learning , 2017, Scientific Reports.

[20]  Mark Walport,et al.  Sharing research data to improve public health , 2011, The Lancet.

[21]  Stan Sclaroff,et al.  Challenges in development of the American Sign Language Lexicon Video Dataset (ASLLVD) corpus , 2012 .

[22]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[23]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[24]  A. Aronson,et al.  Differential diagnostic patterns of dysarthria. , 1969, Journal of speech and hearing research.

[25]  Dimitrios Hristu-Varsakelis,et al.  A Smartphone-Based Tool for Assessing Parkinsonian Hand Tremor , 2015, IEEE Journal of Biomedical and Health Informatics.

[26]  Jon E. Froehlich,et al.  What Do We Mean by “Accessibility Research”?: A Literature Survey of Accessibility Papers in CHI and ASSETS from 1994 to 2019 , 2021, CHI.

[27]  Giuseppe De Pietro,et al.  A new database of healthy and pathological voices , 2018, Comput. Electr. Eng..

[28]  Jutta Treviranus The Value of Being Different , 2019, W4A.

[29]  Lise Crevier-Buchman,et al.  The DesPho-APaDy Project: Developing an Acoustic-phonetic Characterization of Dysarthric Speech in French , 2010, LREC.

[30]  Elmar Nöth,et al.  The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinson's & eating condition , 2015, INTERSPEECH.

[31]  Rey Catabay Individuals with Disabilities Education Act , 2017 .

[32]  G. Conti-Ramsden,et al.  Narrative skills in adolescents with a history of SLI in relation to non-verbal IQ scores , 2007 .

[33]  Carol Neidle,et al.  A new web interface to facilitate access to corpora: development of the ASLLRP data access interface , 2012 .

[34]  Meredith Ringel Morris,et al.  Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective , 2019, ASSETS.

[35]  J R Orozco-Arroyave,et al.  Automatic detection of Parkinson's disease in running speech spoken in three different languages. , 2016, The Journal of the Acoustical Society of America.

[36]  Jerker Westin,et al.  A Treatment-Response Index From Wearable Sensors for Quantifying Parkinson's Disease Motor States , 2018, IEEE Journal of Biomedical and Health Informatics.

[37]  Saad Hassan,et al.  An Isolated-Signing RGBD Dataset of 100 American Sign Language Signs Produced by Fluent ASL Signers , 2020, SIGNLANG.

[38]  Foad Hamidi,et al.  Who Should Have Access to my Pointing Data?: Privacy Tradeoffs of Adaptive Assistive Technologies , 2018, ASSETS.

[39]  Michèle Gouiffès,et al.  MEDIAPI-SKEL - A 2D-Skeleton Video Database of French Sign Language With Aligned French Subtitles , 2020, LREC.

[40]  Richard E. Ladner,et al.  Why is Data on Disability so Hard to Collect and Understand? , 2020, 2020 Research on Equity and Sustained Participation in Engineering, Computing, and Technology (RESPECT).

[41]  Jaafar Alghazo,et al.  ArASL: Arabic Alphabets Sign Language Dataset , 2019, Data in brief.

[42]  J. Winkler,et al.  Unbiased and Mobile Gait Analysis Detects Motor Impairment in Parkinson's Disease , 2013, PloS one.

[43]  Clay Shirky,et al.  Collecting and sharing data for population health: a new paradigm. , 2009, Health affairs.

[44]  W. Applegate,et al.  When It Comes to Older Adults, Language Matters: Journal of the American Geriatrics Society Adopts Modified American Medical Association Style , 2017, Journal of the American Geriatrics Society.

[45]  Torben Bach Pedersen,et al.  Building a web warehouse for accessibility data , 2006, DOLAP '06.

[46]  Radu-Daniel Vatavu,et al.  The Impact of Low Vision on Touch-Gesture Articulation on Mobile Devices , 2018, IEEE Pervasive Computing.

[47]  Meredith Ringel Morris,et al.  Toward fairness in AI for people with disabilities SBG@a research roadmap , 2019, ACM SIGACCESS Access. Comput..

[48]  Joanna McGrenere,et al.  The Design and Field Evaluation of PhotoTalk: A Digital Image Communication Application for People with Aphasia , 2007 .

[49]  Moi Hoon Yap,et al.  Benchmarking human motion analysis using kinect one: An open source dataset , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[50]  Hernisa Kacorri,et al.  IncluSet: A Data Surfacing Repository for Accessibility Datasets , 2020, ASSETS.

[51]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[52]  J. Becker,et al.  The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis. , 1994, Archives of neurology.

[53]  S. R. Mahadeva Prasanna,et al.  Detection of Nasalized Voiced Stops in Cleft Palate Speech Using Epoch-Synchronous Features , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[54]  M. Chamie Survey design strategies for the study of disability. , 1989, World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mondiales.

[55]  Kyungjun Lee,et al.  Hands Holding Clues for Object Recognition in Teachable Machines , 2019, CHI.

[56]  Lise Crevier-Buchman,et al.  The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles , 2016, LREC.

[57]  B. Belza,et al.  Terms and Measures of Cognitive Health Associated With Dementia and Alzheimer’s Disease: A Scoping Review , 2020, Research on aging.

[58]  Arvind Narayanan,et al.  Privacy, Ethics, and Data Access: A Case Study of the Fragile Families Challenge , 2018, Socius : sociological research for a dynamic world.

[59]  Amit P. Sheth,et al.  Predicting Parkinson's Disease Progression with Smartphone Data , 2013 .

[60]  Erwan Bezard,et al.  Pathophysiology of levodopa-induced dyskinesia: Potential for new therapies , 2001, Nature Reviews Neuroscience.

[61]  Karyn Moffatt,et al.  Addressing age-related pen-based target acquisition difficulties , 2010 .

[62]  Oscar Koller,et al.  MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language , 2018, BMVC.

[63]  Michelle N. Meyer,et al.  Practical Tips for Ethical Data Sharing , 2018 .

[64]  Chi Lin,et al.  VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Kathleen C. Fraser,et al.  The importance of sharing patient-generated clinical speech and language data , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.

[66]  Hironobu Takagi,et al.  Supporting Orientation of People with Visual Impairment: Analysis of Large Scale Usage Data , 2016, ASSETS.

[67]  Richard E. Ladner,et al.  WebinSitu: a comparative analysis of blind and sighted browsing behavior , 2007, Assets '07.

[68]  John Glauert,et al.  Dicta-Sign – Building a Multilingual Sign Language Corpus , 2012 .

[69]  Ryen W. White,et al.  Detecting neurodegenerative disorders from web search signals , 2018, npj Digital Medicine.

[70]  Sameer Patil,et al.  Local Standards for Anonymization Practices in Health, Wellness, Accessibility, and Aging Research at CHI , 2019, CHI.

[71]  L. Green,et al.  Hands on! , 2008, MLO: medical laboratory observer.

[72]  Yeliz Yesilada,et al.  Web users with autism: eye tracking evidence for differences , 2018, Behav. Inf. Technol..

[73]  Mariusz Oszust,et al.  Recognition of Hand Gestures Observed by Depth Cameras , 2015 .

[74]  Gary B. Wills,et al.  A survey of open accessibility data , 2014, W4A.

[75]  Ryen W. White,et al.  Population-scale hand tremor analysis via anonymized mouse cursor signals , 2019, npj Digital Medicine.

[76]  M. R. Morris. Accessibility : A Discussion of Ethical Considerations , 2019 .

[77]  Benedikt Fecher,et al.  What Drives Academic Data Sharing? , 2014, PloS one.

[78]  Chat Wacharamanotham,et al.  Transparency of CHI Research Artifacts: Results of a Self-Reported Survey , 2020, CHI.

[79]  Yong-Ju Lee,et al.  Design and creation of Dysarthric Speech Database for development of QoLT software technology , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).

[80]  Elmar Nöth,et al.  Multimodal Assessment of Parkinson's Disease: A Deep Learning Approach , 2019, IEEE Journal of Biomedical and Health Informatics.

[81]  Daniel Gruen,et al.  Considerations for AI fairness for people with disabilities , 2019, SIGAI.

[82]  Tomasz Kapuscinski,et al.  Recognition of Fingerspelling Sequences in Polish Sign Language Using Point Clouds Obtained from Depth Images , 2019, Sensors.

[83]  Fiona Godlee,et al.  Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors. , 2017, JAMA.

[84]  Elena Simperl,et al.  Dataset search: a survey , 2019, The VLDB Journal.

[85]  Andrew Zisserman,et al.  Diagnostically relevant facial gestalt information from ordinary photos , 2014, eLife.

[86]  A. Hillis,et al.  Patterns of decline in naming and semantic knowledge in primary progressive aphasia , 2018, Aphasiology.

[87]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[88]  D. Arkadir,et al.  A machine learning algorithm successfully screens for Parkinson's in web users , 2019, Annals of clinical and translational neurology.

[89]  Vicki L. Hanson,et al.  Writing about accessibility , 2015, Interactions.

[90]  Radu-Daniel Vatavu,et al.  Stroke-Gesture Input for People with Motor Impairments: Empirical Results & Research Roadmap , 2019, CHI.

[91]  John H. L. Hansen,et al.  Automatic childhood autism detection by vocalization decomposition with phone-like units , 2009, WOCCI.

[92]  J. Deller,et al.  The Whitaker database of dysarthric (cerebral palsy) speech. , 1993, The Journal of the Acoustical Society of America.

[93]  D. Kempler,et al.  Syntactic preservation in Alzheimer's disease. , 1987, Journal of speech and hearing research.

[94]  Wenyao Xu,et al.  PDVocal: Towards Privacy-preserving Parkinson's Disease Detection using Non-speech Body Sounds , 2019, MobiCom.

[95]  Ioulietta Lazarou,et al.  The MAMEM Project-A dataset for multimodal human-computer interaction using biosignals and eye tracking information , 2017 .

[96]  Babak Taati,et al.  Vision-based assessment of parkinsonism and levodopa-induced dyskinesia with pose estimation , 2017, Journal of NeuroEngineering and Rehabilitation.

[97]  Gilles Dequen,et al.  Learning to Predict Autism Spectrum Disorder based on the Visual Patterns of Eye-tracking Scanpaths , 2019, HEALTHINF.

[98]  Francesca Odone,et al.  “Hands On” Visual Recognition for Visually Impaired Users , 2017, TACC.

[99]  Gaurav Aggarwal,et al.  Evaluation of Supervised Learning Algorithms Based on Speech Features as Predictors to the Diagnosis of Mild to Moderate Intellectual Disability , 2018, 3D Research.

[100]  M. Maybery,et al.  The misnomer of ‘high functioning autism’: Intelligence is an imprecise predictor of functional abilities at diagnosis , 2019, Autism : the international journal of research and practice.

[101]  Guangtao Zhai,et al.  A dataset of eye movements for the children with autism spectrum disorder , 2019, MMSys.

[102]  D. Haggerty,et al.  What do we mean by a , 2001 .

[103]  Alex Mihailidis,et al.  The toronto rehab stroke pose dataset to detect compensation during stroke rehabilitation therapy , 2017, PervasiveHealth.

[104]  Christopher Frauenberger,et al.  Agency of Autistic Children in Technology Research—A Critical Literature Review , 2019, ACM Trans. Comput. Hum. Interact..

[105]  Chieko Asakawa,et al.  Environmental Factors in Indoor Navigation Based on Real-World Trajectories of Blind Users , 2018, CHI.

[106]  Alaa Mohamed Riad,et al.  SignsWorld Atlas; a benchmark Arabic Sign Language database , 2015, J. King Saud Univ. Comput. Inf. Sci..

[107]  Fiona Godlee,et al.  Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors , 2017, Journal of Korean medical science.

[108]  E. Růžička,et al.  Imprecise vowel articulation as a potential early marker of Parkinson's disease: effect of speaking task. , 2013, The Journal of the Acoustical Society of America.

[109]  Michal Novotný,et al.  Hypernasality associated with basal ganglia dysfunction: evidence from Parkinson’s disease and Huntington’s disease , 2016, PeerJ.

[110]  Hernisa Kacorri,et al.  Data-Driven Synthesis and Evaluation of Syntactic Facial Expressions in American Sign Language Animation , 2016 .

[111]  Timnit Gebru,et al.  Lessons from archives: strategies for collecting sociocultural data in machine learning , 2019, FAT*.

[112]  Ann Blandford,et al.  HCI for health and wellbeing: Challenges and opportunities , 2019, Int. J. Hum. Comput. Stud..

[113]  Fabio Tamburini,et al.  Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions , 2016, LREC.

[114]  Jiebo Luo,et al.  VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[115]  Robert L. Berg,et al.  The Second Fifty Years: Promoting Health and Preventing Disability , 1990 .

[116]  A.D. Hoover,et al.  Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response , 2000, IEEE Transactions on Medical Imaging.

[117]  Sarah Ebling,et al.  SMILE Swiss German Sign Language Dataset , 2018, LREC.

[118]  James M. Rehg,et al.  Behavioral Imaging and Autism , 2014, IEEE Pervasive Computing.

[119]  Stavroula-Evita Fotinea,et al.  GSLC: Creation and Annotation of a Greek Sign Language Corpus for HCI , 2007, HCI.

[120]  Heidi Lam,et al.  A Framework of Interaction Costs in Information Visualization , 2008, IEEE Transactions on Visualization and Computer Graphics.

[121]  Lale Akarun,et al.  BosphorusSign: A Turkish Sign Language Recognition Corpus in Health and Finance Domains , 2016, LREC.

[122]  Margaret Forbes,et al.  AphasiaBank: Methods for studying discourse , 2011, Aphasiology.

[123]  Jeffrey M. Hausdorff,et al.  Altered fractal dynamics of gait: reduced stride-interval correlations with aging and Huntington's disease. , 1997, Journal of applied physiology.

[124]  Luz Rello,et al.  Detecting readers with dyslexia using machine learning with eye tracking measures , 2015, W4A.

[125]  Joakim Lustig,et al.  Identifying dyslectic gaze pattern : Comparison of methods for identifying dyslectic readers based on eye movement patterns , 2016 .

[126]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[127]  Mariusz Oszust,et al.  Polish sign language words recognition with Kinect , 2013, 2013 6th International Conference on Human System Interactions (HSI).

[128]  Chieko Asakawa,et al.  Turn Right: Analysis of Rotation Errors in Turn-by-Turn Navigation for Individuals with Visual Impairments , 2018, ASSETS.

[129]  Thomas Hanke,et al.  Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data , 2016, LREC.

[130]  Nicolas Pugeault,et al.  Sign language recognition using sub-units , 2012, J. Mach. Learn. Res..

[131]  Michael Riegler,et al.  Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients , 2018, MMSys.

[132]  Euripidis Loukis,et al.  MCMC Bayesian inference for heart sounds screening in assistive environments , 2011, PETRA '11.

[133]  Javad Frounchi,et al.  A neural network system for diagnosis and assessment of tremor in parkinson disease patients , 2015, 2015 22nd Iranian Conference on Biomedical Engineering (ICBME).

[134]  Ian Butterworth,et al.  Computer keyboard interaction as an indicator of early Parkinson’s disease , 2016, Scientific Reports.

[135]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[136]  Oscar Koller,et al.  Exploring Collection of Sign Language Datasets: Privacy, Participation, and Model Performance , 2020, ASSETS.

[137]  Jeffrey M. Hausdorff,et al.  Dynamic markers of altered gait rhythm in amyotrophic lateral sclerosis. , 2000, Journal of applied physiology.

[138]  Ricardo Baeza-Yates,et al.  Predicting risk of dyslexia with an online gamified test , 2019, PloS one.

[139]  Fikret S. Gürgen,et al.  Collection and Analysis of a Parkinson Speech Dataset With Multiple Types of Sound Recordings , 2013, IEEE Journal of Biomedical and Health Informatics.

[140]  R. Grant,et al.  Proposed Changes to the American Psychiatric Association Diagnostic Criteria for Autism Spectrum Disorder: Implications for Young Children and Their Families , 2013, Maternal and Child Health Journal.

[141]  T. Declerck,et al.  NKI-CCRT corpus: speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy , 2012 .

[142]  M. Bozzali,et al.  Intrinsic Patterns of Coupling between Correlation and Amplitude of Low-Frequency fMRI Fluctuations Are Disrupted in Degenerative Dementia Mainly due to Functional Disconnection , 2015, PloS one.

[143]  Michael S. Bernstein,et al.  Understanding the Representation and Representativeness of Age in AI Data Sets , 2021, AIES.

[144]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[145]  David Escudero Mancebo,et al.  On the Use of a Serious Game for Recording a Speech Corpus of People with Intellectual Disabilities , 2016, LREC.

[146]  Anne Marie Piper,et al.  HCI and Aging: Beyond Accessibility , 2019, CHI Extended Abstracts.

[147]  Mark Barnes,et al.  Ethical and Practical Concerns about IRB Restrictions on the Use of Research Data. , 2020, Ethics & human research.

[148]  Roberto Manduchi,et al.  WeAllWalk: An Annotated Data Set of Inertial Sensor Time Series from Blind Walkers , 2016, ASSETS.