A financial ticket image intelligent recognition system based on deep learning

Abstract Facing rapid growth in the issuance of financial tickets (e.g., bills, invoices), traditional manual invoice reimbursement and financial accounting systems are imposing an increasing burden on financial accountants and consuming excessive manpower. To solve this problem, we propose an iterative self-learning framework of Financial Ticket Intelligent Recognition System (FTIRS), which supports iteratively updating and extensibility of the algorithm model, which are the fundamental requirements for a practical financial accounting system. In addition, we designed a simple yet efficient Financial Ticket Faster Detection Network (FTFDNet) and an intelligent data warehouse of financial tickets to strengthen its efficiency and performance. Currently, the system can recognize 482 types of financial tickets and has an automatic iterative optimization mechanism. Thus, with increased application time, the types of tickets supported by the system will increase, and the accuracy of recognition will improve. Experimental results show that the average recognition accuracy of the system is 97.41%, and the average running time for a single ticket is 173.72 ms. The practical value of the system has been verified in business. It can greatly improve the efficiency of financial accounting and reduce the human cost of accounting staff.

[1]  Yunchao Wei,et al.  Consistent Structural Relation Learning for Zero-Shot Segmentation , 2020, NeurIPS.

[2]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[3]  Xiang Bai,et al.  Scene text detection and recognition: recent advances and future trends , 2015, Frontiers of Computer Science.

[4]  Petros Maragos,et al.  Morphological filters-Part II: Their relations to median, order-statistic, and stack filters , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5]  Fuji Ren,et al.  Research on Information Recognition of VAT Invoice Based on Computer Vision , 2019, 2019 IEEE 6th International Conference on Cloud Computing and Intelligence Systems (CCIS).

[6]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Shijian Lu,et al.  ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kai Xie,et al.  Dual Model Medical Invoices Recognition , 2019, Sensors.

[10]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[11]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[13]  Bertin Klein,et al.  Results of a Study on Invoice-Reading Systems in Germany , 2004, Document Analysis Systems.

[14]  Hien Thi Ha Recognition of Invoices from Scanned Documents , 2017, RASLAN.

[15]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[16]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[17]  Angel I. Solis,et al.  Domain specific architectures, hardware acceleration for machine/deep learning , 2019, Defense + Commercial Sensing.

[18]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Ole Winther,et al.  CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[22]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[23]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[24]  Larry S. Davis,et al.  Learning a Discriminative Filter Bank Within a CNN for Fine-Grained Recognition , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ming Du,et al.  Computer vision algorithms and hardware implementations: A survey , 2019, Integr..

[27]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[28]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[29]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[30]  N.B. Rais,et al.  Adaptive thresholding technique for document image analysis , 2004, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[31]  Qinghua Zheng,et al.  An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition , 2018, IEEE Transactions on Cybernetics.

[32]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[33]  Andreas Kieri Context Dependent Thresholding and Filter Selection for Optical Character Recognition , 2012 .

[34]  Francesca Cesarini,et al.  Analysis and understanding of multi-class invoices , 2003, Document Analysis and Recognition.

[35]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[36]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[37]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[38]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Eugene Charniak Introduction to Deep Learning , 2019 .

[40]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[41]  Xihong Wu,et al.  Boosting Local Binary Pattern (LBP)-Based Face Recognition , 2004, SINOBIOMETRICS.

[42]  Yolande Belaïd,et al.  Automatic Generation of a Custom Corpora for Invoice Analysis and Recognition , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[43]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[44]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Bo Dong,et al.  Classification of Financial Tickets Using Weakly Supervised Fine-Grained Networks , 2020, IEEE Access.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[50]  Guan Gui,et al.  Template Matching-Based Method for Intelligent Invoice Information Identification , 2019, IEEE Access.

[51]  S J Luck,et al.  Spatial filtering during visual search: evidence from human electrophysiology. , 1994, Journal of experimental psychology. Human perception and performance.

[52]  Chongruo Wu,et al.  ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[53]  Yunchao Wei,et al.  Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning , 2020, ACM Multimedia.

[54]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[55]  D R Badcock,et al.  Low-Frequency Filtering and the Processing of Local—Global Stimuli , 1990, Perception.

[56]  Madhur Kabra,et al.  Automation of Cheque Transaction using Deep Learning and Optical Character Recognition , 2019, 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT).

[57]  Niall O' Mahony,et al.  Deep Learning vs. Traditional Computer Vision , 2019, CVC.

[58]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[59]  Sanchay Gupta,et al.  Optical Character Recognition on Bank Cheques Using 2D Convolution Neural Network , 2018, Advances in Intelligent Systems and Computing.

[60]  Rubeena Vohra,et al.  Optical Character Recognition using Convolutional Neural Network , 2019, 2019 6th International Conference on Computing for Sustainable Global Development (INDIACom).

[61]  Weiping Wang,et al.  SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).