Fuzzy clustering of human activity patterns

In the context of human activity pattern analysis, we adopt a fuzzy clustering around medoids approach to classify ordered sequences (paths). These sequences represent patterns of individual behavior in an actual or virtual space-time domain. A fuzzy approach is suitable for path data, since sequences of human activities are typically characterized by switching behaviors, which are likely to produce overlapping clusters. We adopt a partitioning around medoids strategy since in human activity patterns analysis it is useful to represent each cluster by means of an observed (not fictitious) prototype (medoid). To measure pairwise distances among all sequence pairs we make use of the Levenshtein distance, which allows for the comparison between sequences of different length and explicitly takes into account the sequential nature of the data. We also consider two robust versions of the fuzzy clustering algorithm based, respectively, on the noise cluster and on the trimming technique. Robust algorithms deal with noisy observations, which are likely to occur in this framework and could provide an improvement to the standard model. We show several applications on sequence data, regarding different research areas, like Web usage mining, travel behavior, tourists and shopping paths.

[1]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[2]  Pierpaolo D'Urso,et al.  A Fuzzy Clustering Model for Multivariate Spatial Time Series , 2010, J. Classif..

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Jian Yu,et al.  Analysis of the weighting exponent in the FCM , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[5]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[6]  W C Wilson,et al.  Activity Pattern Analysis by Means of Sequence-Alignment Methods , 1998 .

[7]  Beatrice Lazzerini,et al.  On the Noise Distance in Robust Fuzzy C-Means , 2004, International Conference on Computational Intelligence.

[8]  Luis Angel García-Escudero,et al.  A Proposal for Robust Curve Clustering , 2005, J. Classif..

[9]  Anupam Joshi,et al.  Automatic Web User Profiling and Personalization Using Robust Fuzzy Relational Clustering , 2002 .

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Naranker Dulay,et al.  Routine classification through sequence alignment , 2009, ACM Multimedia.

[12]  Rajesh N. Davé,et al.  Robust fuzzy clustering of relational data , 2002, IEEE Trans. Fuzzy Syst..

[13]  Sung-Hyon Myaeng,et al.  Initializing K-Means using Genetic Algorithms , 2009 .

[14]  Harry Timmermans,et al.  Identifying purchase-history sensitive shopper segments using scanner panel data and sequence alignment methods , 2003 .

[15]  C. Joh,et al.  A segmentation study of pedestrian weekend activity patterns in a central business district , 2010 .

[16]  Heungsun Hwang,et al.  Fuzzy Clusterwise Generalized Structured Component Analysis , 2007 .

[17]  R. Krishnapuram,et al.  A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[18]  Ta Theo Arentze,et al.  Activity pattern similarity : a multidimensional sequence alignment method , 2002 .

[19]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[20]  Jian Yu,et al.  Alpha-Cut Implemented Fuzzy Clustering Algorithms and Switching Regressions , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Geert Wets,et al.  Segmentation of visiting patterns on web sites using a sequence alignment method , 2003 .

[22]  Alex B. McBratney,et al.  Application of fuzzy sets to climatic classification , 1985 .

[23]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[24]  Zhiping Wang,et al.  An new initialization method for fuzzy c-means algorithm , 2008, Fuzzy Optim. Decis. Mak..

[25]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[26]  Osmar R. Zaïane,et al.  Clustering Web sessions by sequence alignment , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[27]  Jongwoo Kim,et al.  Application of the least trimmed squares technique to prototype-based clustering , 1996, Pattern Recognit. Lett..

[28]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[29]  Konrad Rieck,et al.  Similarity measures for sequential data , 2011, WIREs Data Mining Knowl. Discov..

[30]  Kay W. Axhausen,et al.  Analysing interpersonal variability for homogeneous groups of travellers , 2005 .

[31]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[32]  Thomas A. Runkler,et al.  Web mining with relational clustering , 2003, Int. J. Approx. Reason..

[33]  Shokri Z. Selim,et al.  Soft clustering of multidimensional data: a semi-fuzzy approach , 1984, Pattern Recognit..

[34]  Ta Theo Arentze,et al.  A Position-Sensitive Sequence-Alignment Method Illustrated for Space–Time Activity-Diary Data , 2001 .

[35]  Arnon Karnieli,et al.  Linear mixture model approach for selecting fuzzy exponent value in fuzzy c-means algorithm , 2006, Ecol. Informatics.

[36]  Tapan Kamdar,et al.  On Creating Adaptive Web Servers Using Weblog Mining , 2000 .

[37]  Brian Everitt,et al.  Cluster analysis , 1974 .

[38]  Dirk Van den Poel,et al.  Incorporating sequential information into traditional classification models by using an element / position-sensitive SAM , 2005 .

[39]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[40]  Noam Shoval,et al.  Sequence Alignment as a Method for Human Activity Analysis in Space and Time , 2007 .

[41]  Nicolas Labroche,et al.  Learning Web Users Profiles With Relational Clustering Algorithms , 2007 .

[42]  M. Wedel,et al.  A fuzzy clusterwise regression approach to benefit segmentation , 1989 .

[43]  R. Davé,et al.  Noise clustering algorithm revisited , 1997, 1997 Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.97TH8297).

[44]  Sushmita Mitra,et al.  Web mining: a survey in the fuzzy framework , 2004, Fuzzy Sets Syst..

[45]  R. Yager,et al.  Approximate Clustering Via the Mountain Method , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[46]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[47]  Hjp Harry Timmermans,et al.  Multidimensional sequence alignment methods for activity-travel pattern analysis : a comparison of dynamic programming and genetic algorithms , 2010 .

[48]  Hjp Harry Timmermans,et al.  Vacation behavior using a sequence alignment method , 2002 .

[49]  Michael J. Brusco,et al.  Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..

[50]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Peter S. Fader,et al.  An Exploratory Look at Supermarket Shopping Paths , 2005 .

[52]  Mohamed S. Kamel,et al.  A thresholded fuzzy c-means algorithm for semi-fuzzy clustering , 1991, Pattern Recognit..

[53]  Peter S. Fader,et al.  Path Data in Marketing: An Integrative Framework and Prospectus for Model Building , 2009, Mark. Sci..

[54]  P. Groenen,et al.  Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima , 1997 .

[55]  James M. Keller,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions , 2010, IEEE Transactions on Fuzzy Systems.

[56]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .