Classifying and Grouping Narratives with Convolutional Neural Networks, PCA and t-SNE

Each week, the Consumer Financial Protection Bureau (CFPB) receives thousands of consumer complaints about financial products and services. These complaints must be forwarded to the responsible company and posted on the site after 15 days or when the company responds to the complaint, whichever comes first. Published complaints and solutions help consumers solve their problems and also serve as a repository of help for other consumers to avoid or solve problems on their own. Every complaint provides information about the problems people are having, helping them to identify inappropriate practices and allowing them to stop before they become major problems. Culminating in better results for consumers and a better financial market for everyone. Each of the complaints contains information on submission date, company to send the complaint, complaint narrative, among others. However, complaints do not have information on the department to which it should be forwarded. Therefore, in this work, the three approaches to analyze each complaint are: (i) convolutional neural network (CNN) to classify the narratives; (ii) principal components analysis (PCA); and (iii) t-distributed stochastic neighbor embedding (t-SNE) to create a three-dimensional embedding for clustering. Embedding from scratch, Pre-trained Word Vectors (word2Vec) and Global Vectors (GloVe) vectors are used and compared in six different CNNs modeling. The results increase the evidence that pre-trained word vectors is important and that convolutional neural networks and t-SNE can perform remarkably well on real text classification data.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Mark A. Iwen,et al.  Extension of PCA to Higher Order Data Structures: An Introduction to Tensors, Tensor Decompositions, and Tensor PCA , 2018, Proceedings of the IEEE.

[3]  Valerio Pascucci,et al.  Visual Exploration of Semantic Relationships in Neural Word Embeddings , 2018, IEEE Transactions on Visualization and Computer Graphics.

[4]  Xuejie Zhang,et al.  Refining Word Embeddings Using Intensity Scores for Sentiment Analysis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8]  Ketan Shah,et al.  Comparative analysis of effect of stopwords removal on sentiment classification , 2015, 2015 International Conference on Computer, Communication and Control (IC4).

[9]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Hongsheng Li,et al.  Silhouette Analysis for Human Action Recognition Based on Supervised Temporal t-SNE and Incremental Learning , 2015, IEEE Transactions on Image Processing.

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Jie Jiang,et al.  Radar HRRP Target Recognition Based on t-SNE Segmentation and Discriminant Deep Belief Network , 2017, IEEE Geoscience and Remote Sensing Letters.

[17]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[18]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[19]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Ausif Mahmood,et al.  Convolutional Recurrent Deep Learning Model for Sentence Classification , 2018, IEEE Access.