Mining problem-solving strategies from HCI data

Can we learn about users' problem-solving strategies by observing their actions? This article introduces a data mining system that extracts complex behavioral patterns from logged user actions to discover users' high-level strategies. Our application domain is an HCI study aimed at revealing users' strategies in an end-user debugging task and understanding how the strategies relate to gender and to success. We cast this problem as a sequential pattern discovery problem, where user strategies are manifested as sequential behavior patterns. Problematically, we found that the patterns discovered by standard data mining algorithms were difficult to interpret and provided limited information about high-level strategies. To help interpret the patterns as strategies, we examined multiple ways of clustering the patterns into meaningful groups. This collectively led to interesting findings about users' behavior in terms of both gender differences and debugging success. These common behavioral patterns were novel HCI findings about differences in males' and females' behavior with software, and were verified by a parallel study with an independent data set on strategies. As a research endeavor into the interpretability issues faced by data mining techniques, our work also highlights important research directions for making data mining more accessible to non-data-mining experts.

[1]  Heikki Mannila,et al.  Knowledge discovery from telecommunication network alarm databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[2]  Desney S. Tan,et al.  Women go with the (optical) flow , 2003, CHI '03.

[3]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[4]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[5]  Margaret M. Burnett,et al.  Tinkering and gender in end-user programmers' debugging , 2006, CHI.

[6]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[7]  Jiawei Han,et al.  Generating semantic annotations for frequent patterns with context analysis , 2006, KDD '06.

[8]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[10]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Eleni Stroulia,et al.  From run-time behavior to usage scenarios: an interaction-pattern mining approach , 2002, KDD.

[12]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[13]  Naftali Tishby,et al.  The Power of Word Clusters for Text Classification , 2006 .

[14]  Desney S. Tan,et al.  Women take a wider view , 2002, CHI.

[15]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  Srinivasan Parthasarathy,et al.  Summarizing itemset patterns using probabilistic models , 2006, KDD '06.

[18]  Alan F. Blackwell,et al.  The fuzzy felt ethnography—understanding the programming patterns of domestic appliances , 2004, Personal and Ubiquitous Computing.

[19]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Margaret M. Burnett,et al.  Forms/3: A first-order visual language to explore the boundaries of the spreadsheet paradigm , 2001, Journal of Functional Programming.

[21]  Mary Beth Rosson,et al.  Design Planning in End-User Web Development , 2007 .

[22]  Ryszard S. Michalski,et al.  Modeling User Behavior by Integrating AQ Learning with a Database: Initial Results , 2002, Intelligent Information Systems.

[23]  Paul Dourish,et al.  An ethnographic examination of the relationship of gender & end-user programming , 2008 .

[24]  Caitlin Kelleher,et al.  Storytelling alice motivates middle school girls to learn computer programming , 2007, CHI.

[25]  Peter Bühlmann,et al.  Supervised clustering of genes , 2002, Genome Biology.

[26]  Laura Beckwith,et al.  On to the Real World: Gender and Self-Efficacy in Excel , 2007 .

[27]  David F. Redmiles,et al.  Extracting usability information from user interface events , 2000, CSUR.

[28]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[29]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[30]  Margaret M. Burnett,et al.  Gender HCI: What About the Software? , 2006, Computer.

[31]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[32]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[33]  Margaret M. Burnett,et al.  Effectiveness of end-user debugging software features: are there gender issues? , 2005, CHI.

[34]  Johanna Brewer,et al.  Reflections of Gender , Reflections on Gender : Designing Ubiquitous Computing Technologies , 2006 .

[35]  Thorsten Joachims,et al.  The influence of task and gender on search and evaluation behavior using Google , 2006, Inf. Process. Manag..

[36]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[37]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[38]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[39]  K. Gegenfurtner,et al.  Design Issues in Gaze Guidance Under review with ACM Transactions on Computer Human Interaction , 2009 .

[40]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[41]  Gregg Rothermel,et al.  End-user software engineering , 2004, Commun. ACM.

[42]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[43]  Margaret M. Burnett,et al.  Testing vs. code inspection vs. what else?: male and female end users' debugging strategies , 2008, CHI.

[44]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[45]  Margaret M. Burnett,et al.  Gender: An Important Factor in End-User Programming Environments? , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[46]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.