New Frontiers in Mining Complex Patterns

At its heart the act of reviewing is very subjective, but in reality many factors influence user’s decision. This can be called social influence bias. We pick two factors, “Who” and “When” and discuss which factor is more influential when a user posts his/her own rate after reading the past review scores in an online review system. We show that a simple model can learn the factor metric quite efficiently from a vast amount of data that is available in many online review systems and clarify that there is no universal solution and the influential factor depends on each dataset. We use a weighted multinomial generative model that takes account of each user’s influence over other users. We consider two kinds of users: real and virtual, in accordance with the two factors, and assign an influence metric to each. In the former each user has its own metric, but in the latter the metric is assigned to the order of review posting actions (rating). Both metrics are learnable quite efficiently with a few tens of iterations by log-likelihood maximization. Goodness of metric is evaluated by the generalization capability. The proposed method was evaluated and confirmed effective by five review datasets. Different datasets give different results. Some dataset clearly indicates that user influence is more dominant than the order influence while the results are the other way around for some other dataset, and yet other dataset indicates that both factors are not relevant. The third one indicates that the decision is very subjective, i.e., independent of others’ review. We tried to characterize the datasets, but were only partially successful. For datasets where user influence is dominant, we often observe that high metric users have strong positive correlations with three more basic metrics: 1) the number of reviews a user made, 2) the number of the user’s followers who rate the same item, 3) the fraction of the user’s followers who gave the similar rate, but this is not always true. We also observe that the majority of users is normal (average) and there are two small groups of users, each with high metric value and low metric value. Early adopters are not necessarily influential.

[1]  Nada Lavrac,et al.  SegMine workflows for semantic microarray data analysis in Orange4WS , 2011, BMC Bioinformatics.

[2]  Michelangelo Ceci,et al.  ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks , 2015, BMC Bioinformatics.

[3]  Gajendra P. S. Raghava,et al.  dbEM: A database of epigenetic modifiers curated from cancerous and normal genomes , 2016, Scientific Reports.

[4]  Thomas Stützle,et al.  Automatically improving the anytime behaviour of optimisation algorithms , 2014, Eur. J. Oper. Res..

[5]  Hadi Fanaee-T,et al.  Eigenspace method for spatiotemporal hotspot detection , 2014, Expert Syst. J. Knowl. Eng..

[6]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[7]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[8]  Martin Atzmüller,et al.  Subgroup discovery , 2005, Künstliche Intell..

[9]  Hisao Tamaki,et al.  Algorithms for the maximum subarray problem based on matrix multiplication , 1998, SODA '98.

[10]  C. Stam Modern network science of neurological disorders , 2014, Nature Reviews Neuroscience.

[11]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[12]  Michelangelo Ceci,et al.  A Novel Biclustering Algorithm for the Discovery of Meaningful Biological Correlations between microRNAs and their Target Genes , 2013, BMC Bioinformatics.

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[15]  Adrian F. M. Smith,et al.  Simple conditions for the convergence of the Gibbs sampler and Metropolis-Hastings algorithms , 1994 .

[16]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  Tadao Takaoka,et al.  Efficient Algorithms for the Maximum Subarray Problem by Distance Matrix Multiplication , 2002, CATS.

[18]  Wolf Singer,et al.  Neuronal Synchrony: A Versatile Code for the Definition of Relations? , 1999, Neuron.

[19]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[20]  Christine Solnon,et al.  Sparse-sets for domain implementation , 2013, CP 2013.

[21]  Jon Bentley,et al.  Programming pearls: algorithm design techniques , 1984, CACM.

[22]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Luc De Raedt,et al.  Ranked Tiling , 2014, ECML/PKDD.

[24]  W. Singer,et al.  The gamma cycle , 2007, Trends in Neurosciences.

[25]  Jesús S. Aguilar-Ruiz,et al.  Biclustering on expression data: A review , 2015, J. Biomed. Informatics.