Scalable data analytics using crowdsourced repositories and streams

The scalable analysis of crowdsourced data repositories and streams has quickly become a critical experimental asset in multiple fields. It enables the systematic aggregation of otherwise disperse data sources and their efficient processing using significant amounts of computational resources. However, the considerable amount of crowdsourced social data and the numerous criteria to observe can limit analytical off-line and on-line processing due to the intrinsic computational complexity. This paper demonstrates the efficient parallelisation of profiling and recommendation algorithms using tourism crowdsourced data repositories and streams. Using the Yelp data set for restaurants, we have explored two different profiling approaches: entity-based and feature-based using ratings, comments, and location. Concerning recommendation, we use a collaborative recommendation filter employing singular value decomposition with stochastic gradient descent (SVD-SGD). To accurately compute the final recommendations, we have applied post-recommendation filters based on venue suitability, value for money, and sentiment. Additionally, we have built a social graph for enrichment. Our master–worker implementation shows super-linear scalability for 10, 20, 30, 40, 50, and 60 concurrent instances.

[1]  Jinjun Chen,et al.  KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data Applications , 2014, IEEE Transactions on Parallel and Distributed Systems.

[2]  Domenico Talia,et al.  Clouds for Scalable Big Data Analytics , 2013, Computer.

[3]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[4]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[5]  Salvatore Cuomo,et al.  Harnessing sliding-window execution semantics for parallel stream processing , 2018, J. Parallel Distributed Comput..

[6]  Zhenming Liu,et al.  On the efficiency of social recommender networks , 2016, 2015 IEEE Conference on Computer Communications (INFOCOM).

[7]  Rajkumar Buyya,et al.  Distributed data stream processing and edge computing: A survey on resource elasticity and future directions , 2017, J. Netw. Comput. Appl..

[8]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[9]  D. Larose k‐Nearest Neighbor Algorithm , 2005 .

[10]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[11]  Massimo Torquati,et al.  Parallel patterns for heterogeneous CPU/GPU architectures: Structured parallelism from cluster to cloud , 2014, Future Gener. Comput. Syst..

[12]  Murale Narayanan,et al.  A study and analysis of recommendation systems for location-based social network (LBSN) with big data , 2016 .

[13]  Simon Fong,et al.  A Scalable Data Stream Mining Methodology: Stream-Based Holistic Analytics and Reasoning in Parallel , 2014, 2014 2nd International Symposium on Computational and Business Intelligence.

[14]  Juan C. Burguillo,et al.  Personalised fading for stream data , 2017, SAC.

[15]  Philip Sedgwick,et al.  Pearson’s correlation coefficient , 2012, BMJ : British Medical Journal.

[16]  Yohan Jin,et al.  MySpace Video Recommendation with Map-Reduce on Qizmt , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[17]  Yue Xu,et al.  Parallel User Profiling Based on Folksonomy for Large Scaled Recommender Systems: An Implimentation of Cascading MapReduce , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[18]  Judy Qiu,et al.  Parallel Clustering of High-Dimensional Social Media Data Streams , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[19]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[21]  Zhi-Dan Zhao,et al.  User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[22]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[23]  Ulrike Gretzel,et al.  Smart tourism: foundations and developments , 2015, Electronic Markets.

[24]  Miron Livny,et al.  Efficient resource management applied to master worker applications , 2004, J. Parallel Distributed Comput..

[25]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[26]  Surekha Mariam Varghese,et al.  A Scalable Product Recommendations Using Collaborative Filtering in Hadoop for Bigdata , 2016 .