Statistical considerations for crowdsourced perceptual ratings of human speech productions
暂无分享,去创建一个
Daniel Fernández | Daphna Harel | Panos Ipeirotis | Tara McAllister | Panos Ipeirotis | D. Harel | D. Fernández | Tara McAllister
[1] Gerardo Hermosillo,et al. Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.
[2] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[3] Javier R. Movellan,et al. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.
[4] J. Harris,et al. An outcomes study of cochlear implants in deaf patients. Audiologic, economic, and quality-of-life changes. , 1995, Archives of otolaryngology--head & neck surgery.
[5] Douglas M. Bates,et al. Linear mixed model implementation in lme4 , 2013 .
[6] W. A. Scott,et al. Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .
[7] Panagiotis G. Ipeirotis,et al. Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.
[8] Carol Y. Espy-Wilson,et al. Coarticulatory stability in American English /r/ , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[9] D M Ruscello,et al. Visual feedback in treatment of residual phonological disorders. , 1995, Journal of communication disorders.
[10] Zhenghao Chen,et al. Tuned Models of Peer Assessment in MOOCs , 2013, EDM.
[11] Jon Sprouse. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory , 2010, Behavior research methods.
[12] Pietro Perona,et al. Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.
[13] Cheng Cheng,et al. Interval estimation of quantile ratios applied to anti‐cancer drug screening by xenograft experiments , 2010, Statistics in medicine.
[14] K. Muller,et al. An R2 statistic for fixed effects in the linear mixed model , 2008, Statistics in medicine.
[15] M. Swan. Crowdsourced Health Research Studies: An Important Emerging Complement to Clinical Trials in the Public Health Research Ecosystem , 2012, Journal of medical Internet research.
[16] Panagiotis G. Ipeirotis. Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.
[17] R M Dalston,et al. Acoustic characteristics of English /w,r,l/ spoken correctly by young children and adults. , 1975, The Journal of the Acoustical Society of America.
[18] Daren C. Brabham. Crowdsourcing as a Model for Problem Solving , 2008 .
[19] W. Marsden. I and J , 2012 .
[20] Lloyd J. Edwards,et al. Fixed-effect variable selection in linear mixed models using R2 statistics , 2008, Comput. Stat. Data Anal..
[21] J. Fleiss,et al. Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.
[22] Robin Thompson. Maximum likelihood estimation of variance components , 1980 .
[23] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..
[24] L D Shriberg,et al. Developmental phonological disorders. III: Long-term speech-sound normalization. , 1994, Journal of speech and hearing research.
[25] Tara McAllister Byun,et al. Investigating the use of traditional and spectral biofeedback approaches to intervention for /r/ misarticulation. , 2012, American journal of speech-language pathology.
[26] P. Richard Hahn,et al. A Bayesian Hierarchical Model for Inferring Player Strategy Types in a Number Guessing Game , 2014, 1409.4815.
[27] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[28] Panagiotis G. Ipeirotis,et al. Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.
[29] Anita Greenhill,et al. How is success defined and measured in online citizen science: a case study of Zooniverse projects , 2015 .
[30] Jacob Cohen,et al. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .
[31] Daphna Harel,et al. Social, Emotional, and Academic Impact of Residual Speech Errors in School-Aged Children: A Survey Study , 2015, Seminars in Speech and Language.
[32] Duncan J. Watts,et al. Cooperation and Contagion in Web-Based, Networked Public Goods Experiments , 2010, SECO.
[33] Amar Cheema,et al. Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .
[34] Tara S. Behrend,et al. The viability of crowdsourcing for survey research , 2011, Behavior research methods.
[35] K. Barton. MuMIn : multi-model inference, R package version 0.12.0 , 2009 .
[36] Adam J. Berinsky,et al. Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk , 2012, Political Analysis.
[37] Panagiotis G. Ipeirotis,et al. Quality-Based Pricing for Crowdsourced Workers , 2013 .
[38] Benjamin Munson,et al. The role of experience in the perception of phonetic detail in children's speech: a comparison between speech-language pathologists and clinically untrained listeners. , 2012, American journal of speech-language pathology.
[39] Lawrence D. Shriberg,et al. Acoustic phenotypes for speech-genetics studies: reference data for residual /з/ distortions , 2001 .
[40] Jing Cheng,et al. Real longitudinal data analysis for real people: Building a good enough mixed model , 2010, Statistics in medicine.
[41] Alan Macfarlane,et al. Social , 1994, Schizophrenia Research.
[42] D. Bates,et al. Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.
[43] Todd M. Gureckis,et al. CUNY Academic , 2016 .
[44] Panagiotis G. Ipeirotis,et al. Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.
[45] Joseph Hilbe,et al. Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .
[46] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .
[47] Stephanie A Borrie,et al. Use of Crowdsourcing to Assess the Ecological Validity of Perceptual-Training Paradigms in Dysarthria. , 2016, American journal of speech-language pathology.
[48] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[49] Michael D. Buhrmester,et al. Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.
[50] S. Mcleod,et al. A systematic review of the association between childhood speech impairment and participation across the lifespan , 2009 .
[51] H. D. Patterson,et al. Recovery of inter-block information when block sizes are unequal , 1971 .
[52] Elaine R. Hitchcock,et al. Finding the experts in the crowd: Validity and reliability of crowdsourced measures of children’s gradient speech contrasts , 2017, Clinical linguistics & phonetics.
[53] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.
[54] Shinichi Nakagawa,et al. A general and simple method for obtaining R2 from generalized linear mixed‐effects models , 2013 .
[55] J. Pratt. RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .
[56] Beibei Li,et al. Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd-Sourced Content , 2011, Mark. Sci..
[57] Xi Chen,et al. Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..
[58] Miguel Angel Luengo-Oroz,et al. Crowdsourcing Malaria Parasite Quantification: An Online Game for Analyzing Images of Infected Thick Blood Smears , 2012, Journal of medical Internet research.
[59] D. Harville. Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .
[60] David G. Rand,et al. The online laboratory: conducting experiments in a real labor market , 2010, ArXiv.
[61] J. Nelder,et al. Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions , 2001 .
[62] G. Robinson. That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .
[63] Zhenming Shun,et al. Another Look at the Salamander Mating Data: A Modified Laplace Approximation Approach , 1997 .
[64] Katrin Kirchhoff,et al. Using Crowdsourcing Technology for Testing Multilingual Public Health Promotion Materials , 2012, Journal of medical Internet research.
[65] Peter F. Halpin,et al. Deriving gradient measures of child speech from crowdsourced ratings. , 2016, Journal of communication disorders.
[66] Jesse Chandler,et al. Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers , 2013, Behavior Research Methods.
[67] P. Wakker. Explaining the characteristics of the power (CRRA) utility family. , 2008, Health economics.
[68] J. Edwards,et al. Gradient perception of children’s productions of /s/ and /θ/: A comparative study of rating methods , 2017, Clinical linguistics & phonetics.
[69] L D Shriberg,et al. Developmental phonological disorders. II. Short-term speech-sound normalization. , 1994, Journal of speech and hearing research.
[70] A. P. Dawid,et al. Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .
[71] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.
[72] Robert Hagiwara. WPP, No. 90: Acoustic Realizations of American /r/ as Produced by Women and Men , 1995 .
[73] E. Maas,et al. Random versus blocked practice in treatment for childhood apraxia of speech. , 2012, Journal of speech, language, and hearing research : JSLHR.
[74] Pietro Perona,et al. The Multidimensional Wisdom of Crowds , 2010, NIPS.
[75] P. Delattre,et al. A DIALECT STUDY OF AMERICAN R’S BY X-RAY MOTION PICTURE , 1968 .
[76] Peter F. Halpin,et al. Online crowdsourcing for efficient rating of speech: a validation study. , 2015, Journal of communication disorders.
[77] Panagiotis G. Ipeirotis. Demographics of Mechanical Turk , 2010 .
[78] Honghu Liu,et al. Goodness-of-fit measures of R 2 for repeated measures mixed effect models , 2008 .
[79] L. Shriberg,et al. Developmental Phonological Disorders III , 1994 .