Analysis of Companies' Non-financial Disclosures: Ontology Learning by Topic Modeling

Prior studies highlight the merits of integrating Linked Data to aid investors' analyses of company financial disclosures. Non-financial disclosures, including reporting on a company's environmental footprint corporate sustainability, remains an unexplored area of research. One reason cited by investors is the need for earth science knowledge to interpret such disclosures. To address this challenge, we propose an automated system which employs Latent Dirichlet Allocation LDA for the discovery of earth science topics in corporate sustainability text. The LDA model is seeded with a vocabulary generated by terms retrieved via a SPARQL endpoint. The terms are seeded as lexical priors into the LDA model. An ensemble tree combines the resulting topic probabilities and classifies the quality of sustainability disclosures using domain expert ratings published by Google Finance. From an applications stance, our results may be of interest to investors seeking to integrate corporate sustainability considerations into their investment decisions.

[1]  Edward Curry,et al.  XBRL and open data for global financial ecosystems: A linked data approach , 2012, Int. J. Account. Inf. Syst..

[2]  Werner Winiwarter,et al.  Ontology Mapping and Reasoning in Semantic Time Series Processing , 2013, IIWAS '13.

[3]  Deborah L. McGuinness,et al.  Provenance Representation for the National Climate Assessment in the Global Change Information System , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Benedikt Kämpgen,et al.  Accepting the XBRL Challenge with Linked Data for Financial Data Integration , 2014, ESWC.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Wei Liu,et al.  Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances , 2011 .

[7]  George A. Vouros,et al.  Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora , 2007 .

[8]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[9]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[10]  G. Yohe,et al.  Climate Change Impacts in the United States: The Third National Climate Assessment , 2014 .

[11]  Wlodzimierz Drabent,et al.  Extending XML Query Language Xcerpt by Ontology Queries , 2007 .

[12]  Andrzej Bargiela,et al.  Probabilistic Topic Models for Learning Terminological Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Julien Emile-Geay,et al.  Toward a semantic web of paleoclimatology , 2013 .

[14]  Roberto García,et al.  Using Semantic Web Technologies to Facilitate XBRL-based Financial Data Comparability , 2012 .

[15]  Jim Green,et al.  A Linked Science investigation: enhancing climate change data discovery with semantic technologies , 2013, Earth Science Informatics.

[16]  Hal Daumé,et al.  Incorporating Lexical Priors into Topic Models , 2012, EACL.