Performance Analysis of RDBMS and Hadoop Components with Their File Formats for the Development of Recommender Systems

A recommender system is a software that can suggest users through prediction based on their previous data usage in the shortest amount of time. Present recommender systems are designed using complex techniques like collaborative filtering, content-based filtering etc. but a similar system can be built by applying complex queries using different query tools. Performance of these query tools depends upon various factors like data size, file formats of the dataset, aggregate search etc. In this paper, we compare four query tools like Hive, Impala, SparkSQL and MySQL to design a fast and an efficient recommender system. Analysis of these tools is done by comparing the execution time of complex queries on data stored in different file formats like text, CSV, AVRO, PARQUET, RC and ORC. The results obtained indicate that a fast recommender system can be built using a query tool like Impala on a dataset saved in AVRO file format.

[1]  S. Saravanan,et al.  Design of large-scale Content-based recommender system using hadoop MapReduce framework , 2015, 2015 Eighth International Conference on Contemporary Computing (IC3).

[2]  Megat F. Zuhairi,et al.  Big Data: The NoSQL and RDBMS review , 2016, 2016 International Conference on Information and Communication Technology (ICICTM).

[3]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[4]  Haytham Tawfeek al Feel,et al.  Digital Library Recommender System on Hadoop , 2015, 2015 IEEE Fourth Symposium on Network Cloud Computing and Applications (NCCA).

[5]  Anju Bala,et al.  Analyzing Twitter sentiments through big data , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[6]  Anil V. Deorankar,et al.  Friend recommendation system based on lifestyles of users , 2016, 2016 2nd International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB).

[7]  Bogdan Oancea,et al.  Integrating R and Hadoop for Big Data Analysis , 2014, ArXiv.

[8]  Pradeep Kumar M. Kanaujia,et al.  Recommendation system for financial analytics , 2016, 2016 International Conference on ICT in Business Industry & Government (ICTBIG).

[9]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..