Attention Fusion Networks: Combining Behavior and E-mail Content to Improve Customer Support

Customer support is a central objective at Square as it helps us build and maintain great relationships with our sellers. In order to provide the best experience, we strive to deliver the most accurate and quasi-instantaneous responses to questions regarding our products. In this work, we introduce the Attention Fusion Network model which combines signals extracted from seller interactions on the Square product ecosystem, along with submitted email questions, to predict the most relevant solution to a seller's inquiry. We show that the innovative combination of two very different data sources that are rarely used together, using state-of-the-art deep learning systems outperforms, candidate models that are trained only on a single source.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Daniel P. W. Ellis,et al.  Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems , 2015, ArXiv.

[4]  Yi-Chia Wang,et al.  COTA: Improving the Speed and Accuracy of Customer Support through Ranking and Deep Networks , 2018, KDD.

[5]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Manoj Kumar Chinnakotla,et al.  Deep Feature Fusion Network for Answer Quality Prediction in Community Question Answering , 2016, ArXiv.

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[11]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[12]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[13]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[16]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[17]  Qun Liu,et al.  If You Can’t Beat Them Join Them: Handcrafted Features Complement Neural Nets for Non-Factoid Answer Reranking , 2017, EACL.

[18]  Ronan Collobert,et al.  Word Embeddings through Hellinger PCA , 2013, EACL.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.