Modeling Customer Engagement from Partial Observations

It is of high interest for a company to identify customers expected to bring the largest profit in the upcoming period. Knowing as much as possible about each customer is crucial for such predictions. However, their demographic data, preferences, and other information that might be useful for building loyalty programs is often missing. Additionally, modeling relations among different customers as a network can be beneficial for predictions at an individual level, as similar customers tend to have similar purchasing patterns. We address this problem by proposing a robust framework for structured regression on deficient data in evolving networks with a supervised representation learning based on neural features embedding. The new method is compared to several unstructured and structured alternatives for predicting customer behavior (e.g. purchasing frequency and customer ticket) on user networks generated from customer databases of two companies from different industries. The obtained results show 4% to 130% improvement in accuracy over alternatives when all customer information is known. Additionally, the robustness of our method is demonstrated when up to 80% of demographic information was missing where it was up to several folds more accurate as compared to alternatives that are either ignoring cases with missing values or learn their feature representation in an unsupervised manner.

[1]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[3]  Martin T. Hagan,et al.  Neural network design , 1995 .

[4]  Zoran Obradovic,et al.  Semi-supervised learning for structured regression on partially observed attributed graphs , 2018, SDM.

[5]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[6]  Nemanja Djuric,et al.  Hidden Conditional Random Fields with Distributed User Embeddings for Ad Targeting , 2014 .

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[10]  J. Zico Kolter,et al.  Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting , 2013, ICML.

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Zoran Obradovic,et al.  Uncertainty Propagation in Long-Term Structured Regression on Evolving Networks , 2016, AAAI.

[13]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[14]  Max Welling,et al.  Hidden-Unit Conditional Random Fields , 2011, AISTATS.

[15]  Zoran Obradovic,et al.  Improving confidence while predicting trends in temporal disease networks , 2018, ArXiv.

[16]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[17]  Tara N. Sainath,et al.  Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Youjae Yi,et al.  Effects of loyalty programs on value perception, program loyalty, and brand loyalty , 2003 .

[19]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[20]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[21]  Alex Acero,et al.  Training Algorithms for Hidden Conditional Random Fields , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Thierry Artières,et al.  Neural conditional random fields , 2010, AISTATS.

[23]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[24]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[25]  Zoran Obradovic,et al.  Continuous Conditional Random Fields for Regression in Remote Sensing , 2010, ECAI.

[26]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[27]  Bernd Stauß,et al.  Customer frustration in loyalty programs , 2005 .

[28]  M. Uncles,et al.  Do Customer Loyalty Programs Really Work , 1997 .

[29]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Zoran Obradovic,et al.  Neural Gaussian Conditional Random Fields , 2014, ECML/PKDD.

[32]  Nemanja Djuric,et al.  Hidden Conditional Random Fields with Deep User Embeddings for Ad Targeting , 2014, 2014 IEEE International Conference on Data Mining.

[33]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[34]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[35]  Zoran Obradovic,et al.  Risk Assessment of a Transmission Line Insulation Breakdown Due to Lightning and Severe Weather , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[36]  Matthieu Guillaumin,et al.  Quantized Kernel Learning for Feature Matching , 2014, NIPS.

[37]  Dusan Ramljak,et al.  Panning for gold: using variograms to select useful connections in a temporal multigraph setting , 2014, Social Network Analysis and Mining.