Leveraging national tourist offices through data analytics

Purpose This study aims to propose a data-driven approach, based on open-source tools, that makes it possible to understand customer satisfaction of the accommodation offer of a whole country. Design/methodology/approach The method starts by extracting information from all hotels of Portugal available at TripAdvisor through Web scraping. Then, a support vector machine is adopted for modeling the TripAdvisor score, which is considered a proxy of customer satisfaction. Finally, knowledge extraction from the model is achieved using sensitivity analysis to unveil the influence of features on the score. Findings The model of the TripAdvisor score achieved a mean absolute percentage error of around 5 per cent, proving the value of modeling the extracted data. The number of rooms of the unit and the minimum price are the two most relevant features, showing that customers appreciate smaller and more expensive units, whereas the location of the hotel does not hold significant relevance. Originality/value National tourist offices can use the proposed approach to understand what drives tourists’ satisfaction, helping to shape a country’s strategy. For example, licensing new hotels may take into account the unit size and other characteristics that make it more attractive to tourists. Furthermore, the procedure can be replicated at any time and in any country, making it a valuable tool for data-driven decision support on a national scale.

[1]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  A. A. M. Ariffin,et al.  A preliminary study on customer expectations of hotel hospitality: influences of personal and hotel factors. , 2012 .

[4]  António Pedro da Cruz e Silva Estorninho,et al.  Mobile services adoption in a hospitality consumer context , 2018 .

[5]  Sérgio Moro,et al.  Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study , 2019, J. Inf. Sci..

[6]  Sérgio Moro,et al.  Factors Influencing Hotels’ Online Prices , 2018 .

[7]  Miyoung Jeong,et al.  Customer Reviews of Hotel Experiences through Consumer Generated Media (CGM) , 2008 .

[8]  Ryan Mitchell,et al.  Web Scraping with Python: Collecting Data from the Modern Web , 2015 .

[9]  Paulo Rita,et al.  Stripping customers' feedback on hotels through data mining: the case of Las Vegas Strip , 2017 .

[10]  Gianfranco Piras,et al.  sphet: Spatial Models with Heteroskedastic Innovations in R , 2010 .

[11]  Bob McKercher,et al.  The tourism data gap: The utility of official tourism information for the hospitality and tourism industry , 2013 .

[12]  Yipeng Wang,et al.  PypeR, A Python Package for Using R in Python , 2010 .

[13]  Paulo Rita,et al.  Unveiling the features of successful eBay smartphone sellers , 2018, Journal of Retailing and Consumer Services.

[14]  Ana María Munar,et al.  Tourist-created Content: Rethinking Destination Branding , 2011 .

[15]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[16]  Hulya Kurgun,et al.  Entrepreneurial Marketing-The Interface between Marketing and Entrepreneurship: A Qualitative Research on Boutique Hotels , 2011 .

[17]  Paulo Cortez,et al.  Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool , 2010, ICDM.

[18]  Ana Catarina Calheiros,et al.  Sentiment Classification of Consumer-Generated Online Reviews Using Topic Modeling , 2017 .

[19]  Sérgio Moro,et al.  Brand strategies in social media in hospitality and tourism , 2017 .

[20]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[21]  Paulo Rita,et al.  Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach , 2016 .

[22]  Paulo Cortez,et al.  Using sensitivity analysis and visualization techniques to open black box data mining models , 2013, Inf. Sci..