Between-Subjects Elicitation Studies: Formalization and Tool Support

Elicitation studies, where users supply proposals meant to effect system commands, have become a popular method for system designers. But the method to date has assumed a within-subjects procedure and statistics. Despite the benefits of examining the relative agreement of independent groups (e.g., men versus women, children versus adults, novices versus experts, etc.), the lack of appropriate tools for between-subjects agreement rate analysis have prevented so far such comparative investigations. In this work, we expand the elicitation method to between-subjects designs. We introduce a new measure for evaluating coagreement between groups and a new statistical test for agreement rate analysis that reports the exact p-value to evaluate the significance of the difference between agreement rates calculated for independent groups. We show the usefulness of our tools by re-examining previously published gesture elicitation data, for which we discuss significant differences in agreement for technical and non-technical participants, men and women, and different acquisition technologies. Our new tools will enable practitioners to properly analyze their user-elicited data resulted from complex experimental designs with multiple independent groups and, consequently, will help them understand agreement data and verify hypotheses about agreement at more sophisticated levels of analysis.

[1]  Radu-Daniel Vatavu,et al.  Touch interaction for children aged 3 to 6 years: Experimental findings and relationship to motor skills , 2015, Int. J. Hum. Comput. Stud..

[2]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .

[3]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[4]  Lisa Anthony,et al.  Understanding childdefined gestures and children's mental models for touchscreen tabletop interaction , 2014, IDC.

[5]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[6]  Nitin R. Patel,et al.  A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables , 1983 .

[7]  Elisabeth André,et al.  Studying user-defined iPad gestures for interaction in multi-display environment , 2012, IUI '12.

[8]  Yang Li,et al.  User-defined motion gestures for mobile interaction , 2011, CHI.

[9]  Richard E. Ladner,et al.  Usable gestures for blind people: understanding preference and performance , 2011, CHI.

[10]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[11]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[12]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[13]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[14]  Sebastian Möller,et al.  I'm home: Defining and evaluating a gesture set for smart-home control , 2011, Int. J. Hum. Comput. Stud..

[15]  Andy Cockburn,et al.  User-defined gestures for augmented reality , 2013, INTERACT.

[16]  Anna Ståhlbröst,et al.  Participatory design: one step back or two steps forward? , 2008, PDC.

[17]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[18]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[19]  Timothy R. C. Read,et al.  Pearson's X and the Loglikelihood Ratio Statistic G : A Comparative Review 2 , 2022 .

[20]  Meredith Ringel Morris,et al.  Web on the wall: insights from a multimodal interaction elicitation study , 2012, ITS.

[21]  Jesse Hoey,et al.  The Two-Way Likelihood Ratio (G) Test and Comparison to Two-Way 2 Test , 2012 .

[22]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[23]  Radu-Daniel Vatavu A comparative study of user-defined handheld vs. freehand gestures for home entertainment environments , 2013, J. Ambient Intell. Smart Environ..

[24]  Ali Mazalek,et al.  Exploring the design space of gestural interaction with active tokens through user-defined gestures , 2014, CHI.

[25]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[26]  Michael Rohs,et al.  User-defined gestures for connecting mobile phones, public displays, and tabletops , 2010, Mobile HCI.

[27]  Daniel J. Wigdor,et al.  Métamorphe: augmenting hotkey usage with actuated keys , 2013, CHI.

[28]  Radu-Daniel Vatavu,et al.  Formalizing Agreement Analysis for Elicitation Studies: New Measures, Significance Test, and Toolkit , 2015, CHI.

[29]  Meredith Ringel Morris,et al.  User-defined gestures for surface computing , 2009, CHI.

[30]  R. Fisher,et al.  Statistical Methods for Research Workers. , 1955 .

[31]  Radu-Daniel Vatavu,et al.  Child or Adult? Inferring Smartphone Users' Age Group from Touch Measurements Alone , 2015, INTERACT.

[32]  H. O. Lancaster,et al.  Significance Tests in Discrete Distributions , 1961 .

[33]  M. P. Matud,et al.  Gender differences in creative thinking , 2007 .

[34]  Jacob O. Wobbrock,et al.  Beyond QWERTY: augmenting touch screen keyboards with multi-touch gestures for non-alphanumeric input , 2012, CHI.

[35]  James C. Spall,et al.  Efficient Monte Carlo computation of Fisher information matrix using prior information , 2007, Comput. Stat. Data Anal..

[36]  Bongshin Lee,et al.  Reducing legacy bias in gesture elicitation studies , 2014, INTR.

[37]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[38]  Teddy Seyed,et al.  Eliciting usable gestures for multi-display environments , 2012, ITS.

[39]  M. Kendall,et al.  The Problem of $m$ Rankings , 1939 .

[40]  O. Holsti Content Analysis for the Social Sciences and Humanities , 1969 .

[41]  Daniel Vogel,et al.  Soft-Constraints to Reduce Legacy and Performance Bias to Elicit Whole-body Gestures with Low Arm Fatigue , 2015, CHI.

[42]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[43]  Radu-Daniel Vatavu,et al.  Leap gestures for TV: insights from an elicitation study , 2014, TVX.

[44]  Hans-Werner Gellersen,et al.  How groups of users associate wireless devices , 2013, CHI.

[45]  Brad A. Myers,et al.  Maximizing the guessability of symbolic input , 2005, CHI Extended Abstracts.

[46]  Judith A. Hall Nonverbal sex differences : communication accuracy and expressive style , 1984 .

[47]  Radu-Daniel Vatavu,et al.  Understanding the consistency of users' pen and finger stroke gesture articulation , 2013, Graphics Interface.

[48]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .

[49]  Radu-Daniel Vatavu,et al.  User-defined gestures for free-hand TV control , 2012, EuroITV.

[50]  L Girelli,et al.  Gender differences in visuo-spatial processing: the importance of distinguishing between passive storage and active manipulation. , 1998, Acta psychologica.

[51]  Finn Kensing,et al.  Participatory Design: Issues and Concerns , 2004, Computer Supported Cooperative Work (CSCW).

[52]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[53]  Huiyue Wu,et al.  User-Defined Body Gestures for TV-based Applications , 2012, 2012 Fourth International Conference on Digital Home.

[54]  Timothy R. C. Read,et al.  Pearsons-X2 and the loglikelihood ratio statistic-G2: a comparative review , 1989 .