Web Scraper Revealing Trends of Target Products and New Insights in Online Shopping Websites

Trillions of posts from Facebook, tweets in Twitter, photos on Instagram and e-mails on exchange servers are overwhelming the Internet with big data. This necessitates the development of such tools that can detect the frequent updates and select the required information instantly. This research work aims to implement scraper software that is capable of collecting the updated information from the target products hosted in fabulous online e-commerce websites. The software is implemented using Scrapy and Django frameworks. The software is configured and evaluated across different e-commerce websites. Individual website generates a greater amount of data about the products that need to be scraped. The proposed software provides the ability to search a target product in a single consolidated place instead of searching across various websites, such as amazon.com, alibaba.com and daraz.pk. Furthermore, the scheduling mechanism enables the scraper to execute at a required frequency within a specified time frame.

[1]  Sakari Taipale,et al.  The use of mobile technology for online shopping and entertainment among older adults in Finland , 2017, Telematics Informatics.

[2]  M. Griffiths,et al.  The Social Impact of Internet Gambling , 2002 .

[3]  C. Pantelides,et al.  A simple continuous-time process scheduling formulation and a novel solution algorithm , 1996 .

[4]  P. Waddell,et al.  New Insights into Rental Housing Markets across the United States: Web Scraping and Analyzing Craigslist Rental Listings , 2016, 1605.05397.

[5]  Timothy Baldwin,et al.  Web Scraping Made Simple with SiteScraper , 2010 .

[6]  Grazyna Suchacka,et al.  Application of Neural Network to Predict Purchases in Online Store , 2016, ISAT.

[7]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Uwe Schwiegelshohn,et al.  Improving First-Come-First-Serve Job Scheduling by Gang Scheduling , 1998, JSSPP.

[9]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[10]  Eloisa Vargiu,et al.  Exploiting web scraping in a collaborative filtering- based approach to web advertising , 2012, Artif. Intell. Res..

[11]  Huda Khayrallah,et al.  Translation of Unknown Words in Low Resource Languages , 2016, AMTA.

[12]  Grazyna Suchacka,et al.  Modeling A Session-Based Bots' Arrival Process At A Web Server , 2017, ECMS.

[13]  Leonid Zhukov,et al.  Clustering of bipartite advertiser-keyword graph , 2003 .

[14]  Paul E. Black,et al.  Dictionary of Algorithms and Data Structures | NIST , 1998 .

[15]  Michael Schrenk,et al.  Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL , 2007 .

[16]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[17]  Raymond M. Bryant,et al.  Operating system support for parallel programming on RP3 , 1991, IBM J. Res. Dev..

[18]  Online Advertising: Creating a Relationship Between Businesses and Consumers , 2018 .

[19]  Norman Meuschke,et al.  Matrix-Based News Aggregation: Exploring Different News Perspectives , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[20]  M. Griffiths Internet addiction: Does it really exist? , 1998 .

[21]  Anoop Gupta,et al.  The impact of operating system scheduling policies and synchronization methods of performance of parallel applications , 1991, SIGMETRICS '91.

[22]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[23]  R. Lawrance,et al.  A Comparative Study on String Matching Algorithm of Biological Sequences , 2014, ArXiv.

[24]  Francesco De Pellegrini,et al.  YOUStatAnalyzer: a tool for analysing the dynamics of YouTube content popularity , 2013, VALUETOOLS.

[25]  Kar Yan Tam,et al.  e-Service Environment: Impacts of Web Interface Characteristics on Consumers' Online Shopping Behavior , 2016 .