A New Approach for Cluster Detection for Large Datasets with High Dimensionality

The study of the use of computers through human computer interfaces (HCI) is essential to improve the productivity in any computer application environment. HCI analysts use a number of techniques to build models that are faithful to actual computer use. A key technique is through eye tracking, in which the region of the screen being examined is recorded in order to determine key areas of use. Clustering techniques allow these regions to be grouped to help facilitate usability analysis. Historically, approaches such as the Expectation Maximization (EM) and K-Means algorithm have performed well. Unfortunately, these approaches require the number of clusters k to be known beforehand -in many real world situations, this hampers the effectiveness of the analysis of the data. We propose a novel algorithm that is well suited for cluster discovery for HCI data; we do not require the number of clusters to be specified a priori and our approach scales very well for both large datasets and high dimensionality. Experiments have demonstrated that our approach works well for real data from HCI applications.

[1]  Manuel de Buenaga,et al.  Multidocument summarization: An added value to clustering in interactive retrieval , 2004 .

[2]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[3]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[4]  Joseph H. Goldberg,et al.  Eye tracking in web search tasks: design implications , 2002, ETRA.

[5]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[6]  Susan R. Fussell,et al.  Effects of task properties, partner actions, and message content on eye gaze patterns in a collaborative task , 2005, CHI.

[7]  Bing Pan,et al.  The determinants of web page viewing behavior: an eye-tracking study , 2004, ETRA.

[8]  Tony F. Chan,et al.  Computing standard deviations: accuracy , 1979, CACM.

[9]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[10]  Roel Vertegaal,et al.  EyeWindows: evaluation of eye-controlled zooming windows for focus selection , 2005, CHI.

[11]  Shumin Zhai,et al.  Conversing with the user based on eye-gaze patterns , 2005, CHI.

[12]  Osmar R. Zaïane,et al.  A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Jianhong Wu,et al.  Subspace clustering for high dimensional categorical data , 2004, SKDD.