Profiling users in a 3g network using hourglass co-clustering

With widespread popularity of smart phones, more and more users are accessing the Internet on the go. Understanding mobile user browsing behavior is of great significance for several reasons. For example, it can help cellular (data) service providers (CSPs) to improve service performance, thus increasing user satisfaction. It can also provide valuable insights about how to enhance mobile user experience by providing dynamic content personalization and recommendation, or location-aware services. In this paper, we try to understand mobile user browsing behavior by investigating whether there exists distinct "behavior patterns" among mobile users. Our study is based on real mobile network data collected from a large 3G CSP in North America. We formulate this user behavior profiling problem as a "co-clustering" problem, i.e., we group both users (who share similar browsing behavior), and browsing profiles (of like-minded users) simultaneously. We propose and develop a scalable co-clustering methodology, Phantom, using a novel hourglass model. The proposed hourglass model first reduces the dimensions of the input data and performs divisive hierarchical co-clustering on the lower dimensional data; it then carries out an expansion step that restores the original dimensions. Applying Phantom to the mobile network data, we find that there exists a number of prevalent and distinct behavior patterns that persist over time, suggesting that user browsing behavior in 3G cellular networks can be captured using a small number of co-clusters. For instance, behavior of most users can be classified as either homogeneous (users with very limited set of browsing interests) or heterogeneous (users with very diverse browsing interests), and such behavior profiles do not change significantly at either short (30-min) or long (6 hour) time scales.

[1]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[2]  Christos Makris,et al.  XML Filtering Using Dynamic Hierarchical Clustering of User Profiles , 2008, DEXA.

[3]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[4]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[6]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[7]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[8]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Kenneth M. Hall An r-Dimensional Quadratic Placement Algorithm , 1970 .

[10]  Inderjit S. Dhillon,et al.  A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[11]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[12]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[13]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[14]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[15]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[16]  Aleksandar Kuzmanovic,et al.  Measuring serendipity: connecting people, locations and interests in a mobile 3G network , 2009, IMC '09.

[17]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.