Business Taxonomy Construction Using Concept-Level Hierarchical Clustering

Business taxonomies are indispensable tools for investors to do equity research and make professional decisions. However, to identify the structure of industry sectors in an emerging market is challenging for two reasons. First, existing taxonomies are designed for mature markets, which may not be the appropriate classification for small companies with innovative business models. Second, emerging markets are fast-developing, thus the static business taxonomies cannot promptly reflect the new features. In this article, we propose a new method to construct business taxonomies automatically from the content of corporate annual reports. Extracted concepts are hierarchically clustered using greedy affinity propagation. Our method requires less supervision and is able to discover new terms. Experiments and evaluation on the Chinese National Equities Exchange and Quotations (NEEQ) market show several advantages of the business taxonomy we build. Our results provide an effective tool for understanding and investing in the new growth companies.

[1]  Jstor,et al.  Invention in the Industrial Research Laboratory , 1963, Journal of Political Economy.

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  Andrew W. Alford THE EFFECT OF THE SET OF COMPARABLE FIRMS ON THE ACCURACY OF THE PRICE EARNINGS VALUATION METHOD , 1992 .

[4]  J. Wyatt Decision support systems. , 2000, Journal of the Royal Society of Medicine.

[5]  Horacio Rodríguez,et al.  Improving Term Extraction by System Combination Using Boosting , 2001, ECML.

[6]  Keith H. Black Who Is My Peer? A Valuation-Based Approach to the Selection of Comparable Firms , 2002 .

[7]  Sanjeev Bhojraj,et al.  Who Is My Peer? A Valuation-Based Approach to the Selection of Comparable Firms , 2002 .

[8]  Marianne Afifi,et al.  Joint Conference on Digital Libraries (JCDL) , 2003 .

[9]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[10]  T. Yalcinoz,et al.  Implementing soft computing techniques to solve economic dispatch problem in power systems , 2008, Expert Syst. Appl..

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Jianxiong Xiao,et al.  Joint Affinity Propagation for Multiple View Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Gerard Hoberg,et al.  Text-Based Network Industries and Endogenous Product Differentiation , 2010, Journal of Political Economy.

[14]  Lu Wang,et al.  Clustering query refinements by user intent , 2010, WWW '10.

[15]  Stefano Faralli,et al.  A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch , 2011, IJCAI.

[16]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[17]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[18]  Andrew McCallum,et al.  Topic models for taxonomies , 2012, JCDL '12.

[19]  Flavius Frasincar,et al.  Domain taxonomy learning from text: The subsumption method versus hierarchical clustering , 2013, Data Knowl. Eng..

[20]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[21]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[22]  Flavius Frasincar,et al.  A semantic approach for extracting domain taxonomies from text , 2014, Decis. Support Syst..

[23]  See-Kiong Ng,et al.  Taxonomy Construction Using Syntactic Contextual Evidence , 2014, EMNLP.

[24]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[25]  Katja Hose,et al.  Partout: a distributed engine for efficient RDF processing , 2012, WWW.

[26]  Flavius Frasincar,et al.  Automated product taxonomy mapping in an e-commerce environment , 2015, Expert Syst. Appl..

[27]  Rita Ormsby,et al.  Industry classification schemes: An analysis and review , 2016 .

[28]  Chandler Jake,et al.  Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) , 2016 .

[29]  Aoying Zhou,et al.  A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances , 2017, EMNLP.

[30]  Rajendra Akerkar,et al.  Knowledge Based Systems , 2017, Encyclopedia of GIS.

[31]  Erik Cambria,et al.  Natural language based financial forecasting: a survey , 2017, Artificial Intelligence Review.

[32]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[33]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[34]  Erik Cambria,et al.  Growing semantic vines for robust asset allocation , 2019, Knowl. Based Syst..