Interpretable Machine Learning Based on Integration of NLP and Psychology in Peer-to-Peer Lending Risk Evaluation

With the rapid development of Peer-to-Peer (P2P) lending in the financial field, abundant data of lending agencies have appeared. P2P agencies also have problems such as absconded with ill-gotten gains and out of business. Therefore, it is urgent to use the interpretable AI in Fintech to evaluate the lending risk effectively. In this paper we use the machine learning and deep learning method to model and analyze the unstructured natural language text of P2P agencies, and we propose an interpretable machine learning method to evaluate the fraud risk of P2P agencies, which enhances the credibility of the AI model. First, this paper explains model behavior based on the psychological interpersonal fraud theory in the field of social science. At the same time, the NLP and influence function in the field of natural science are used to verify that the machine learning model really learns the information of part-of-speech details in the fraud theory, which provides the psychological interpretable support for the model of P2P risk evaluation. In addition, we propose “style vectors” to describe the overall differences between text styles of P2P agencies and understand model behavior. Experiments show that using style vectors and influence functions to describe text style differences is the same as human intuitive perception. This proves that the machine learning model indeed learn the text style difference and use it for risk evaluation, which further shows that the model has a certain machine learning interpretability.

[1]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Hui Xiong,et al.  Enhancing investment decisions in P2P lending: an investor composition perspective , 2011, KDD.

[4]  Stephen J. Roberts,et al.  Optimal client recommendation for market makers in illiquid financial products , 2017, ECML/PKDD.

[5]  He Jin-qu The documents classification algorithm based on LDA , 2014 .

[6]  Shom Prasad Das,et al.  A novel hybrid model using teaching–learning-based optimization and a support vector machine for commodity futures index forecasting , 2015, International Journal of Machine Learning and Cybernetics.

[7]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[8]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[9]  Hui Xiong,et al.  Instance-based credit risk assessment for investment decisions in P2P lending , 2016, Eur. J. Oper. Res..

[10]  Yue Lv,et al.  Data-driven Risk Assessment for Peer-to-Peer Network Lending Agencies , 2018, 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS).

[11]  Shotaro Minami,et al.  Predicting Equity Price with Corporate Action Events Using LSTM-RNN , 2018 .

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[14]  Counterfactual thinking about actions and failures to act. , 2000 .

[15]  Tat-Seng Chua,et al.  TEM: Tree-enhanced Embedding Model for Explainable Recommendation , 2018, WWW.

[16]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[17]  T. Lombrozo Causal–explanatory pluralism: How intentions, functions, and mechanisms influence causal ascriptions , 2010, Cognitive Psychology.