Optimization-based User Group Management : Discovery, Analysis, Recommendation

User data is becoming increasingly available in multiple domains ranging from phone usage traces to data on the social Web. User data is a special type of data that is described by user demographics (e.g., age, gender, occupation, etc.) and user activities (e.g., rating, voting, watching a movie, etc.) The analysis of user data is appealing to scientists who work on population studies, online marketing, recommendations, and large-scale data analytics. However, analysis tools for user data is still lacking. In this thesis, we believe there exists a unique opportunity to analyze user data in the form of user groups. This is in contrast with individual user analysis and also statistical analysis on the whole population. A group is defined as set of users whose members have either common demographics or common activities. Group-level analysis reduces the amount of sparsity and noise in data and leads to new insights. In this thesis, we propose a user group management framework consisting of following components: user group discovery, analysis and recommendation. The very first step in our framework is group discovery, i.e., given raw user data, obtain user groups by optimizing one or more quality dimensions. The second component (i.e., analysis) is necessary to tackle the problem of information overload: the output of a user group discovery step often contains millions of user groups. It is a tedious task for an analyst to skim over all produced groups. Thus we need analysis tools to provide valuable insights in this huge space of user groups. The final question in the framework is how to use the found groups. In this thesis, we investigate one of these applications, i.e., user group recommendation, by considering affinities between group members. All our contributions of the proposed framework are evaluated using an extensive set of experiments both for quality and performance.

[1]  J. Mazanec,et al.  Consumer decision making. , 1994 .

[2]  Andrei Z. Broder,et al.  Anatomy of the long tail: ordinary people with extraordinary tastes , 2010, WSDM '10.

[3]  D. Pager The Mark of a Criminal Record1 , 2003, American Journal of Sociology.

[4]  Joseph A. Konstan,et al.  Introduction to recommender systems: Algorithms and Evaluation , 2004, TOIS.

[5]  Alexandre Termier,et al.  PGLCM: efficient parallel mining of closed frequent gradual itemsets , 2010, 2010 IEEE International Conference on Data Mining.

[6]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[8]  Barry Smyth,et al.  Recommendation to Groups , 2007, The Adaptive Web.

[9]  Haiyang Yang,et al.  Consumer Decision Making , 2015 .

[10]  Gregory Piatetsky-Shapiro,et al.  An Application of KEFM to the Analysis of Healthcare Information , 1994, KDD Workshop.

[11]  Christoph Koch,et al.  Approximation schemes for many-objective query optimization , 2014, SIGMOD Conference.

[12]  Kristian J. Hammond,et al.  Flytrap: intelligent group music recommendation , 2002, IUI '02.

[13]  Stefan Wrobel,et al.  One click mining: interactive local pattern discovery through implicit preference and performance learning , 2013, IDEA@KDD.

[14]  Ioannis G. Tollis,et al.  Algorithms for Drawing Graphs: an Annotated Bibliography , 1988, Comput. Geom..

[15]  Guillaume Cleuziou,et al.  A Generalization of k-Means for Overlapping Clustering , 2007 .

[16]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[17]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[18]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[19]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Divesh Srivastava,et al.  Chronos: Facilitating History Discovery by Linking Temporal Records , 2012, Proc. VLDB Endow..

[21]  Laura Sebastia,et al.  On the design of individual and group recommender systems for tourism , 2011, Expert Syst. Appl..

[22]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[23]  Bruce A. Barton,et al.  Entropy and MDL discretization of continuous variables for Bayesian belief networks , 2000, Int. J. Intell. Syst..

[24]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[25]  Yehuda Koren,et al.  Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy , 2011, RecSys '11.

[26]  Matthijs van Leeuwen Interactive Data Exploration Using Pattern Mining , 2014, Interactive Knowledge Discovery and Data Mining in Biomedical Informatics.

[27]  Joe Grobelny,et al.  Designing for the Social Web , 2009 .

[28]  Lei Chen,et al.  Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services , 2012, Proc. VLDB Endow..

[29]  Pourang Irani,et al.  WiFIsViz: Effective Visualization of Frequent Itemsets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[30]  Patrice Boizumault,et al.  Soft Threshold Constraints for Pattern Mining , 2012, Discovery Science.

[31]  Patrick Valduriez,et al.  Best Position Algorithms for Top-k Queries , 2007, VLDB.

[32]  Naren Ramakrishnan,et al.  Redescription Mining: Structure Theory and Algorithms , 2005, AAAI.

[33]  Cong Yu,et al.  Space efficiency in group recommendation , 2010, The VLDB Journal.

[34]  Gediminas Adomavicius,et al.  Incorporating contextual information in recommender systems using a multidimensional approach , 2005, TOIS.

[35]  Xingshe Zhou,et al.  TV Program Recommendation for Multiple Viewers Based on user Profile Merging , 2006, User Modeling and User-Adapted Interaction.

[36]  Arthur Zimek,et al.  Model Selection for Semi-Supervised Clustering , 2014, EDBT.

[37]  Divesh Srivastava,et al.  Robust Group Linkage , 2015, WWW.

[38]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[39]  Xi Chen,et al.  Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization , 2010, SDM.

[40]  Chedy Raïssi,et al.  Mining Dominant Patterns in the Sky , 2011, 2011 IEEE 11th International Conference on Data Mining.

[41]  Pat Barclay,et al.  Who Cries Wolf, and When? Manipulation of Perceived Threats to Preserve Rank in Cooperative Groups , 2013, PloS one.

[42]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[43]  Melvin F. Janowitz,et al.  The k-weak Hierarchical Representations: An Extension of the Indexed Closed Weak Hierarchies , 2003, Discret. Appl. Math..

[44]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[45]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[46]  Charu C. Aggarwal,et al.  Online community detection in social sensing , 2013, WSDM.

[47]  Sihem Amer-Yahia,et al.  SOCLE: Towards a framework for data preparation in social applications , 2014, Ingénierie des Systèmes d Inf..

[48]  Jacques Bertin,et al.  Semiology of Graphics - Diagrams, Networks, Maps , 2010 .

[49]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[50]  Sihem Amer-Yahia,et al.  Group Recommendation with Temporal Affinities , 2015, EDBT.

[51]  Jure Leskovec,et al.  Automatic Versus Human Navigation in Information Networks , 2012, ICWSM.

[52]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[53]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[54]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[55]  Panos M. Pardalos,et al.  Multilevel Optimization: Algorithms and Applications , 2012 .

[56]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[57]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[58]  Alexandre Termier,et al.  Interactive User Group Analysis , 2015, CIKM.

[59]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[60]  Charu C. Aggarwal,et al.  Community Detection with Edge Content in Social Media Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[61]  A. Bandura,et al.  Imitation of film-mediated agressive models. , 1963, Journal of abnormal and social psychology.

[62]  Jeffrey Heer,et al.  A tour through the visualization zoo , 2010, ACM Queue.

[63]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[64]  H. V. Jagadish,et al.  Skimmer: rapid scrolling of relational query results , 2012, SIGMOD Conference.

[65]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[66]  Gregory Piatetsky,et al.  Selecting and Reporting What is Interesting � The KEFIR Application to Healthcare Data , 2004 .

[67]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[68]  Jayant R. Haritsa,et al.  Providing Diversity in K-Nearest Neighbor Query Results , 2003, PAKDD.

[69]  David C. Schmittlein,et al.  Counting Your Customers: Who-Are They and What Will They Do Next? , 1987 .

[70]  Bart Goethals,et al.  MIME: a framework for interactive visual pattern mining , 2011, KDD.

[71]  Fabrice Guillet,et al.  Post-Processing of Discovered Association Rules Using Ontologies , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[72]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[73]  N. Klein,et al.  Context Effects on Effort and Accuracy in Choice: An Enquiry into Adaptive Decision Making , 1989 .

[74]  Sihem Amer-Yahia,et al.  Task assignment optimization in knowledge-intensive crowdsourcing , 2015, The VLDB Journal.

[75]  Snehasis Mukhopadhyay,et al.  Interactive pattern mining on hidden data: a sampling-based solution , 2012, CIKM.

[76]  Cong Yu,et al.  Group Recommendation: Semantics and Efficiency , 2009, Proc. VLDB Endow..

[77]  Douglas H. Fisher,et al.  Data mining tasks and methods: Clustering: conceptual clustering , 2002 .

[78]  Alessandro Chessa,et al.  Group Recommendation with Automatic Identification of Users Communities , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[79]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[80]  Cong Yu,et al.  Exploiting group recommendation functions for flexible preferences , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[81]  Sumit Ganguly,et al.  Query optimization for parallel execution , 1992, SIGMOD '92.

[82]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[83]  Alexandre Termier,et al.  Interactive Data-Driven Research: the place where databases and data mining research meet , 2015 .

[84]  Arnab Nandi,et al.  Distributed and interactive cube exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[85]  James Allan,et al.  Strategy-based interactive cluster visualization for information retrieval , 2000, International Journal on Digital Libraries.

[86]  Emre Velipasaoglu,et al.  Intent-based diversification of web search results: metrics and algorithms , 2011, Information Retrieval.

[87]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[88]  Cong Yu,et al.  MRI: Meaningful Interpretations of Collaborative Ratings , 2011, Proc. VLDB Endow..

[89]  Paolo Papotti,et al.  KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing , 2015, SIGMOD Conference.

[90]  Jiawei Han,et al.  Discovering interesting patterns through user's interactive feedback , 2006, KDD '06.

[91]  Paolo Papotti,et al.  BigDansing: A System for Big Data Cleansing , 2015, SIGMOD Conference.

[92]  Luc De Raedt,et al.  Constraint-Based Pattern Set Mining , 2007, SDM.

[93]  Shlomo Berkovsky,et al.  Group-based recipe recommendations: analysis of data aggregation strategies , 2010, RecSys '10.

[94]  R. Olshavsky,et al.  Task Complexity and Contingent Processing in Brand Choice , 1979 .

[95]  John Riedl,et al.  PolyLens: A recommender system for groups of user , 2001, ECSCW.

[96]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[97]  Hans-Peter Kriegel,et al.  Fast Group Recommendations by Applying User Clustering , 2012, ER.

[98]  Mihalis Yannakakis,et al.  On the approximability of trade-offs and optimal access of Web sources , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[99]  H. V. Jagadish,et al.  Guided Interaction: Rethinking the Query-Result Paradigm , 2011, Proc. VLDB Endow..

[100]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[101]  Xue Li,et al.  Time weight collaborative filtering , 2005, CIKM '05.

[102]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[103]  Alexandre Termier,et al.  Towards a Framework for Semantic Exploration of Frequent Patterns , 2013, IMMoA.

[104]  Tijl De Bie,et al.  A framework for mining interesting pattern sets , 2010, UP '10.

[105]  Aijun An,et al.  Efficient Bi-objective Team Formation in Social Networks , 2012, ECML/PKDD.

[106]  Sebastiano Pizzutilo,et al.  Group modeling in a public space: methods, techniques, experiences , 2005 .