Characterizing and Learning Representation on Customer Contact Journeys in Cellular Services

Corporations spend billions of dollars annually caring for customers across multiple contact channels. A customer journey is the complete sequence of contacts that a given customer has with a company across multiple channels of communication. While each contact is important and contains rich information, studying customer journeys provides a better context to understand customers' behavior in order to improve customer satisfaction and loyalty, and to reduce care costs. However, journey sequences have a complex format due to the heterogeneity of user behavior: they are variable-length, multi-attribute, and exhibit a large cardinality in categories (e.g. contact reasons). The question of how to characterize and learn representations of customer journeys has not been studied in the literature. We propose to learn journey embeddings using a sequence-to-sequence framework that converts each customer journey into a fixed-length latent embedding. In order to improve the disentanglement and distributional properties of embeddings, the model is further modified by incorporating a Wasserstein autoencoder inspired regularization on the distribution of embeddings. Experiments conducted on an enterprise-scale dataset demonstrate the effectiveness of the proposed model and reveal significant improvements due to the regularization in both distinguishing journey pattern characteristics and predicting future customer engagement.

[1]  K. Kvåle,et al.  Improving service quality through customer journey analysis , 2016 .

[2]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[3]  Iryna Gurevych,et al.  Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2018, ACL 2018.

[4]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[5]  Xing Chen,et al.  Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. , 2017, Molecular bioSystems.

[6]  Raif M. Rustamov,et al.  Closed‐form expressions for maximum mean discrepancy with applications to Wasserstein auto‐encoders , 2019, Stat.

[7]  Angelika Steger,et al.  Fast-Slow Recurrent Neural Networks , 2017, NIPS.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Zhenming Liu,et al.  Speaking with Actions - Learning Customer Journey Behavior , 2019, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[11]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[12]  Jinho D. Choi,et al.  Intrinsic and Extrinsic Evaluations of Word Embeddings , 2016, AAAI.

[13]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Jeffrey J. Spiess,et al.  Using Big Data to Improve Customer Experience and Business Performance , 2014, Bell Labs Tech. J..

[16]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[17]  Fei Wang,et al.  Patient Subtyping via Time-Aware LSTM Networks , 2017, KDD.

[18]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[19]  Xiang Ren,et al.  Characterizing and Forecasting User Engagement with In-App Action Graph: A Case Study of Snapchat , 2019, KDD.

[20]  Jiawei Han,et al.  I Know You'll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application , 2018, KDD.

[21]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[22]  Periklis Andritsos,et al.  Contextual and Behavioral Customer Journey Discovery Using a Genetic Approach , 2019, ADBIS.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[26]  Walter L. Smith Probability and Statistics , 1959, Nature.

[27]  Gerard de Melo,et al.  Exploring Semantic Properties of Sentence Embeddings , 2018, ACL.

[28]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[29]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[30]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Bernhard Schölkopf,et al.  From Variational to Deterministic Autoencoders , 2019, ICLR.

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[34]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.