Entropic Statistical Description of Big Data Quality in Hotel Customer Relationship Management

Customer Relationship Management (CRM) is a fundamental tool in the hospitality industry nowadays, which can be seen as a big-data scenario due to the large amount of recordings which are annually handled by managers. Data quality is crucial for the success of these systems, and one of the main issues to be solved by businesses in general and by hospitality businesses in particular in this setting is the identification of duplicated customers, which has not received much attention in recent literature, probably and partly because it is not an easy-to-state problem in statistical terms. In the present work, we address the problem statement of duplicated customer identification as a large-scale data analysis, and we propose and benchmark a general-purpose solution for it. Our system consists of four basic elements: (a) A generic feature representation for the customer fields in a simple table-shape database; (b) An efficient distance for comparison among feature values, in terms of the Wagner-Fischer algorithm to calculate the Levenshtein distance; (c) A big-data implementation using basic map-reduce techniques to readily support the comparison of strategies; (d) An X-from-M criterion to identify those possible neighbors to a duplicated-customer candidate. We analyze the mass density function of the distances in the CRM text-based fields and characterized their behavior and consistency in terms of the entropy and of the mutual information for these fields. Our experiments in a large CRM from a multinational hospitality chain show that the distance distributions are statistically consistent for each feature, and that neighbourhood thresholds are automatically adjusted by the system at a first step and they can be subsequently more-finely tuned according to the manager experience. The entropy distributions for the different variables, as well as the mutual information between pairs, are characterized by multimodal profiles, where a wide gap between close and far fields is often present. This motivates the proposal of the so-called X-from-M strategy, which is shown to be computationally affordable, and can provide the expert with a reduced number of duplicated candidates to supervise, with low X values being enough to warrant the sensitivity required at the automatic detection stage. The proposed system again encourages and supports the benefits of big-data technologies in CRM scenarios for hotel chains, and rather than the use of ad-hoc heuristic rules, it promotes the research and development of theoretically principled approaches.

[1]  Srivatsa Maddodi,et al.  Data Deduplication Techniques and Analysis , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.

[2]  Mahmoud Abd Ellatif,et al.  A systematic review for the determination and classification of the CRM critical success factors supporting with their metrics , 2018, Future Computing and Informatics Journal.

[3]  J. Ross,et al.  MIDER: Network Inference with Mutual Information Distance and Entropy Reduction , 2014, PloS one.

[4]  Marleen de Bruijne,et al.  Machine learning approaches in medical image analysis: From detection to diagnosis , 2016, Medical Image Anal..

[5]  Guy Fitzgerald,et al.  Why CRM Efforts Fail? A Study of the Impact of Data Quality and Data Integration , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[6]  Li-Wei Wu Satisfaction, inertia, and customer loyalty in the varying levels of the zone of tolerance and alternative attractiveness , 2011 .

[7]  F. Mavondo,et al.  Organisational capabilities: Antecedents and implications for customer value , 2008 .

[8]  Margaret Ross,et al.  Entity reconciliation in big data sources: A systematic mapping study , 2017, Expert Syst. Appl..

[9]  Nima Jafari Navimipour,et al.  Customer relationship management mechanisms: A systematic review of the state of the art literature and recommendations for future research , 2016, Comput. Hum. Behav..

[10]  Ying Liu,et al.  Understanding big consumer opinion data for market-driven product design , 2016 .

[11]  Jeretta Horn Nord,et al.  Data quality issues in implementing an ERP , 2002, Ind. Manag. Data Syst..

[12]  Elizabeth Chang,et al.  Interactive feature selection for efficient customer recognition in contact centers: Dealing with common names , 2018, Expert Syst. Appl..

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[15]  Susan Wolf Ditkoff,et al.  Cómo impulsar la filantropía , 2010 .

[16]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[18]  Yichen Lin,et al.  Strategic analysis of customer relationship management—a field study on hotel enterprises , 2003 .

[19]  Harlan E. Spotts Marketing, Technology and Customer Commitment in the New Economy , 2015 .

[20]  A. Chadha,et al.  CASE STUDY OF HOTEL TAJ IN THE CONTEXT OF CRM AND CUSTOMER RETENTION , 2015 .

[21]  Davide Aloini,et al.  Big Data: a proposal for enabling factors in Customer Relationship Management , 2016 .

[22]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[23]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[24]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[25]  V. Kumar,et al.  Interaction Orientation and Firm Performance , 2008 .

[26]  V. Kumar,et al.  Measuring and maximizing customer equity: a critical analysis , 2007 .

[27]  Joseph S. Sherif,et al.  Strategies for successful CRM implementation , 2007, Inf. Manag. Comput. Secur..

[28]  Jun-Ho Huh,et al.  Big Data Analysis for Personalized Health Activities: Machine Learning Processing for Automatic Keyword Extraction Approach , 2018, Symmetry.

[29]  Merlin Stone,et al.  Managing the quality and completeness of customer data , 2002 .

[30]  Christel Daniel-Le Bozec,et al.  Initializing a hospital-wide data quality program. The AP-HP experience , 2019, Comput. Methods Programs Biomed..

[31]  Samira Si-Said Cherfi,et al.  A Framework for Quality Evaluation in Data Integration Systems , 2007, ICEIS.

[32]  Adir Even,et al.  Evaluating a model for cost-effective data quality management in a real-world CRM setting , 2010, Decis. Support Syst..

[33]  Antonio Padilla-Meléndez,et al.  Customer relationship management in hotels: examining critical success factors , 2014 .

[34]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[35]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[36]  Byung-Cheol Kim,et al.  Customer Information Sharing: Strategic Incentives and New Implications , 2007 .

[37]  Oliver Schilke,et al.  Customer relationship management and firm performance: the mediating role of business strategy , 2010 .

[38]  Manuel Filipe Santos,et al.  Data pre-processing for database marketing , 2004 .

[39]  Abdullah S. Al-Mudimigh,et al.  Customer relationship management and big data enabled: Personalization & customization of services , 2019, Applied Computing and Informatics.

[40]  Robert Isele,et al.  Active learning of expressive linkage rules using genetic programming , 2013, J. Web Semant..

[41]  Dimitris Spathis,et al.  A comparison between semi-supervised and supervised text mining techniques on detecting irony in greek political tweets , 2016, Eng. Appl. Artif. Intell..

[42]  Alireza Faed,et al.  An Intelligent Customer Complaint Management System with Application to the Transport and Logistics Industry , 2013 .

[43]  Ramon C. Barquin,et al.  About The Data Warehousing Institute , 1997 .

[44]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[45]  Avigdor Gal,et al.  Comparative Analysis of Approximate Blocking Techniques for Entity Resolution , 2016, Proc. VLDB Endow..

[46]  Marianna Sigala,et al.  Integrating customer relationship management in hotel operations: managerial and operational implications , 2005 .

[47]  Omar E. M. Khalil,et al.  Relationship Marketing and Data Quality Management , 1999 .

[48]  Erik Cambria,et al.  Semi-supervised learning for big social data analysis , 2018, Neurocomputing.

[49]  Debra Zahay,et al.  Sources, uses, and forms of data in the new product development process , 2004 .

[50]  Shital Gaikwad,et al.  A survey analysis on duplicate detection in Hierarchical Data , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[51]  Richard J Courtheoux,et al.  Marketing data analysis and data quality management , 2003 .

[52]  Srikumar Venugopal,et al.  A systematic review and comparative analysis of cross-document coreference resolution methods and tools , 2016, Computing.

[53]  Qin Zhang,et al.  Random Multi-Graphs: A semi-supervised learning framework for classification of high dimensional data , 2017, Image Vis. Comput..

[54]  Ewa Ziemba,et al.  Information Technology For Management , 2009 .

[55]  Zahir Irani,et al.  Organisational, technical and data quality factors in CRM adoption — SMEs perspective , 2011 .

[56]  Peter B. Seddon,et al.  A Multi-Project Model of Key Factors Affecting Organizational Benefits from Enterprise Systems , 2010, MIS Q..

[57]  Abbas Keramati,et al.  A process-oriented perspective on customer relationship management and organizational performance: An empirical investigation , 2010 .

[58]  Nor Aziah Abu Kasim,et al.  Linking CRM strategy, customer performance measures and performance in the hotel industry , 2009 .

[59]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[60]  Cheng-Zen Yang,et al.  Enhancements for duplication detection in bug reports with manifold correlation features , 2016, J. Syst. Softw..

[61]  Leopoldo E. Bertossi,et al.  ERBlox: Combining matching dependencies with machine learning for entity resolution , 2015, Int. J. Approx. Reason..

[62]  Vadlamani Ravi,et al.  Evolutionary computing applied to customer relationship management: A survey , 2016, Eng. Appl. Artif. Intell..

[63]  A Reid,et al.  Hidden Data Quality Problems in CRM Implementation , 2015 .

[64]  Klaus H. Maier-Hein,et al.  Exploiting the potential of unlabeled endoscopic video data with self-supervised learning , 2017, International Journal of Computer Assisted Radiology and Surgery.

[65]  J. Rojo-álvarez,et al.  Using big data from Customer Relationship Management information systems to determine the client profile in the hotel sector , 2018, Tourism Management.

[66]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[67]  Michael D. Olsen,et al.  Marketing Challenges for the Next Decade , 2000 .

[68]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[69]  António Pacheco,et al.  Theoretical foundations of forward feature selection methods based on mutual information , 2017, Neurocomputing.

[70]  Laura M. Haas,et al.  Information integration in the enterprise , 2008, CACM.