Human Learning in Data Science

As machine learning becomes a more and more important area in Data Science, bringing with it a rise of abstractness and complexity, the desire for explainability rises, too. With our work we aim to gain explainability focussing on correlation clustering and try to pursue the original goals of different Data Science tasks,: Extracting knowledge from data. As well-known tools like Fold-It or GeoTime show, gamification is a very mighty approach, but not only to solve tasks which prove more difficult for machines than for humans. We could also gain knowledge from how players proceed trying to solve those difficult tasks. That is why we developed Straighten it up!, a game in which users try to find the best linear correlations in high dimensional datasets. Finding arbitrarily oriented subspaces in high dimensional data is an exponentially complex task due to the number of potential subspaces in regards to the number of dimensions. Nevertheless, linearly correlated points are as a simple pattern easy to track by the human eye. Straighten it up! gives users an overview over two-dimensional projections of a self-chosen dataset. Users decide which subspace they want to examine first, and can draw in arbitrarily many lines fitting the data. An offset inside of which points are assigned to the corresponding line can easily be chosen for every line independently, and users can switch between different projections at any time. We developed a scoring system not only as incentive, but first of all for further examination, based on the density of each cluster, its minimum spanning tree, size of offset, and coverage. By tracking every step of a user we are able to detect common mechanisms and examine differences to state-of-the-art correlation and subspace clustering algorithms, resulting in more comprehensibility.

[1]  M. Blanchette,et al.  Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment , 2012, PloS one.

[2]  Christian Böhm,et al.  Computing Clusters of Correlation Connected objects , 2004, SIGMOD '04.

[3]  Lennart E. Nacke,et al.  From game design elements to gamefulness: defining "gamification" , 2011, MindTrek.

[4]  Christopher Cunningham,et al.  Gamification by Design - Implementing Game Mechanics in Web and Mobile Apps , 2011 .

[5]  A. Zimek,et al.  Global Correlation Clustering Based on the Hough Transform , 2008, Stat. Anal. Data Min..

[6]  Juho Hamari,et al.  Does Gamification Work? -- A Literature Review of Empirical Studies on Gamification , 2014, 2014 47th Hawaii International Conference on System Sciences.

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Elke Achtert,et al.  Mining Hierarchies of Correlation Clusters , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[9]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[10]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[11]  Z. Popovic,et al.  Crystal structure of a monomeric retroviral protease solved by protein folding game players , 2011, Nature Structural &Molecular Biology.

[12]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[13]  Peer Kröger,et al.  Detecting Global Hyperparaboloid Correlated Clusters Based on Hough Transform , 2017, SSDBM.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Elke Achtert,et al.  Global Correlation Clustering Based on the Hough Transform , 2008, Stat. Anal. Data Min..