An Earth Mover's Distance Based Graph Distance Metric For Financial Statements

Quantifying the similarity between a group of companies has proven to be useful for several purposes, including company benchmarking, fraud detection, and searching for investment opportunities. This exercise can be done using a variety of data sources, such as company activity data and financial data. However, ledger account data is widely available and is standardized to a large extent. Such ledger accounts within a financial statement can be represented by means of a tree, i.e. a special type of graph, representing both the values of the ledger accounts and the relationships between them. Given their broad availability and rich information content, financial statements form a prime data source based on which company similarities or distances could be computed. In this paper, we present a graph distance metric that enables one to compute the similarity between the financial statements of two companies. We conduct a comprehensive experimental study using real-world financial data to demonstrate the usefulness of our proposed distance metric. The experimental results show promising results on a number of use cases. This method may be useful for investors looking for investment opportunities, government officials attempting to identify fraudulent companies, and accountants looking to benchmark a group of companies based on their financial statements.

[1]  Rasa Kanapickienė The Model of Fraud Detection in Financial Statements by Means of Financial Ratios , 2015 .

[2]  G. Ma,et al.  Financial Statement Dissimilarity and SEC Scrutiny , 2019 .

[3]  Yu Cong,et al.  The Impact of XBRL Reporting on Market Efficiency , 2014, J. Inf. Syst..

[4]  Joseph P. H. Fan,et al.  The Measurement of Relatedness: An Application to Corporate Diversification , 2000 .

[5]  Chyan-long Jan An Effective Financial Statements Fraud Detection Model for the Sustainable Development of Financial Markets: Evidence from Taiwan , 2018 .

[6]  R. Nagy,et al.  Factors Influencing Individual Investor Behavior , 1994 .

[7]  Patrick E. Hopkins,et al.  The Effect of Financial Statement Classification of Hybrid Financial Instruments on Financial Analysts' Stock Price Judgments , 1996 .

[8]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  S. Kothari,et al.  The Benefits of Financial Statement Comparability , 2011 .

[12]  Kathleen M. Kahle,et al.  The Impact of Industry Classifications on Financial Research , 1996, Journal of Financial and Quantitative Analysis.

[13]  David C. Yen,et al.  A Graph Mining Approach to Identify Financial Reporting Patterns: An Empirical Examination of Industry Classifications , 2018, Decis. Sci..

[14]  Steve Yang,et al.  Balance sheet outlier detection using a graph similarity algorithm , 2013, 2013 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).

[15]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.