Diverse Demands Estimation and Ranking Based on User Behaviors

In the big data era, users can get massive information from the Internet, but the value density is very low. In order to help users find the information they need more quickly, this paper presents the mechanism of diverse demands estimation and ranking based on user behaviors. Firstly, a definition of classification system for users query intent is proposed. Secondly, in order to mine the documents on the websites of specific classification, LDA model is used to cluster and annotate the websites. To speed up the inference process of LDA, we take advantage of MPI and OpenMP hybrid parallelism techniques to reduce both internode and intra-node communication cost. Lastly, according to the historical behaviors of users and the search engine return results, we rank the classifications on Map-Reduce platform and present the top-ranking ones to users

[1]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[2]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[3]  Guangjie Han,et al.  Characteristics of Co-Allocated Online Services and Batch Jobs in Internet Data Centers: A Case Study From Alibaba Cloud , 2019, IEEE Access.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Yuehui Huang,et al.  Auto tuning for new energy dispatch problem: A case study , 2016, Future Gener. Comput. Syst..

[6]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[7]  Naixue Xiong,et al.  Interdomain I/O Optimization in Virtualized Sensor Networks , 2018, Sensors.

[8]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[9]  Torsten Hoefler,et al.  Cache-Oblivious MPI All-to-All Communications Based on Morton Order , 2018, IEEE Transactions on Parallel and Distributed Systems.

[10]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Baodong Wu,et al.  Hybrid-optimization strategy for the communication of large-scale Kinetic Monte Carlo simulation , 2017, Comput. Phys. Commun..

[12]  Yumei Wang,et al.  Energy Aware Virtual Machine Scheduling in Data Centers , 2019, Energies.

[13]  Zhiyuan Liu,et al.  PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing , 2011, TIST.

[14]  Fei Gao,et al.  High performance computing for advanced modeling and simulation of materials , 2017, Comput. Phys. Commun..

[15]  Shengen Yan,et al.  A Cross-Platform SpMV Framework on Many-Core Architectures , 2016, ACM Trans. Archit. Code Optim..

[16]  Torsten Hoefler,et al.  POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures , 2017, PPOPP.

[17]  Edward Y. Chang,et al.  PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications , 2009, AAIM.

[18]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[19]  Max Welling,et al.  Distributed Inference for Latent Dirichlet Allocation , 2007, NIPS.

[20]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[21]  A. Vrenios Parallel Programming in C with MPI and OpenMP [Book Review] , 2004, IEEE Distributed Systems Online.

[22]  Koji Eguchi,et al.  MPI/OpenMP hybrid parallel inference for Latent Dirichlet Allocation , 2011, LDMTA '11.

[23]  Max Welling,et al.  Asynchronous Distributed Learning of Topic Models , 2008, NIPS.