Knowledge discovery out of text data: a systematic review via text mining

Purpose The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge management and the information technology communities. Since its emergence, text mining has involved multidisciplinary studies, focused primarily on database technology, Web-based collaborative writing, text analysis, machine learning and knowledge discovery. However, owing to the large amount of research in this field, it is becoming increasingly difficult to identify existing studies and therefore suggest new topics. Design/methodology/approach This article offers a systematic review of 85 academic outputs (articles and books) focused on knowledge discovery derived from the text mining technique. The systematic review is conducted by applying “text mining at the term level, in which knowledge discovery takes place on a more focused collection of words and phrases that are extracted from and label each document” (Feldman et al., 1998, p. 1). Findings The results revealed that the keywords extracted to be associated with the main labels, id est, knowledge discovery and text mining, can be categorized in two periods: from 1998 to 2009, the term knowledge and text were always used. From 2010 to 2017 in addition to these terms, sentiment analysis, review manipulation, microblogging data and knowledgeable users were the other terms frequently used. Besides this, it is possible to notice the technical, engineering nature of each term present in the first decade. Whereas, a diverse range of fields such as business, marketing and finance emerged from 2010 to 2017 owing to a greater interest in the online environment. Originality/value This is a first comprehensive systematic review on knowledge discovery and text mining through the use of a text mining technique at term level, which offers to reduce redundant research and to avoid the possibility of missing relevant publications.

[1]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[2]  Raymond Y. K. Lau,et al.  Social analytics: Learning fuzzy product ontologies for aspect-oriented sentiment analysis , 2014, Decis. Support Syst..

[3]  Yonghong Peng,et al.  Text mining for traditional Chinese medical knowledge discovery: A survey , 2010, J. Biomed. Informatics.

[4]  Ling Liu,et al.  Manipulation of online reviews: An analysis of ratings, readability, and sentiments , 2012, Decis. Support Syst..

[5]  Paul Bookhamer,et al.  Knowledge Management in a Global Context: A Case Study , 2016, Inf. Resour. Manag. J..

[6]  Yung-Ming Li,et al.  Deriving market intelligence from microblogs , 2013, Decis. Support Syst..

[7]  Usman Qamar,et al.  TOM: Twitter opinion mining framework using hybrid classification scheme , 2014, Decis. Support Syst..

[8]  Raymond J. Mooney,et al.  Text mining with information extraction , 2004 .

[9]  Jenny A. Harding,et al.  Textual data mining for industrial knowledge management and text classification: A business oriented approach , 2012, Expert Syst. Appl..

[10]  A. SalloumSaid,et al.  A survey of text mining in social media facebook and twitter perspectives , 2017 .

[11]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[12]  Xiaoming Zhang,et al.  Kernel Discriminant Learning for Ordinal Regression , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Rahul C. Basole,et al.  IT innovation adoption by enterprises: Knowledge discovery through text analytics , 2013, Decis. Support Syst..

[14]  Uzay Kaymak,et al.  Multi-lingual support for lexicon-based sentiment analysis guided by semantics , 2014, Decis. Support Syst..

[16]  Jiawei Han,et al.  MetaPAD: Meta Pattern Discovery from Massive Text Corpora , 2017, KDD.

[17]  Rashmi Data Mining: A Knowledge Discovery Approach , 2012 .

[18]  Ramakrishnan Srikant,et al.  Discovering Trends in Text Databases , 1997, KDD.

[19]  D. Vrontis,et al.  Ambidexterity, external knowledge and performance in knowledge-intensive firms , 2017 .

[20]  Jianguo Lu,et al.  Bias Correction in a Small Sample from Big Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Heiko Paulheim,et al.  Semantic Web in data mining and knowledge discovery: A comprehensive survey , 2016, J. Web Semant..

[22]  Atika Mustafa,et al.  Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization , 2009 .

[23]  Huang Yuan,et al.  Web mining: knowledge discovery on the Web , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[24]  Yen-Ting Chen,et al.  Exploring the continuance intentions of consumers for B2C online shopping: perspectives of fairness and trust , 2012, Online Inf. Rev..

[25]  Hsinchun Chen,et al.  Evaluating sentiment in financial news articles , 2012, Decis. Support Syst..

[26]  Mark Warschauer,et al.  Web-Based Collaborative Writing in L2 Contexts: Methodological Insights from Text Mining. , 2017 .

[27]  Carlos Iván Chesñevar,et al.  A novel approach for classifying customer complaints through graphs similarities in argumentative dialogues , 2009, Decis. Support Syst..

[28]  David Cornforth,et al.  Ranking of high-value social audiences on Twitter , 2016, Decis. Support Syst..

[29]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[30]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[31]  G. Nagamallika,et al.  To Characterize The Contents Of The Documents Through Pattern Discovery In Text Mining , 2017 .

[32]  Ming Zhou,et al.  Joint Inference of Named Entity Recognition and Normalization for Tweets , 2012, ACL.

[33]  Jan Vanthienen,et al.  Information mining - Reflections on recent advancements and the road ahead in data, text, and media mining , 2011, Decis. Support Syst..

[34]  Byungun Yoon,et al.  A text-mining-based patent network: Analytical tool for high-technology trend , 2004 .

[35]  Yung-Ming Li,et al.  A social appraisal mechanism for online purchase decision support in the micro-blogosphere , 2014, Decis. Support Syst..

[36]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[37]  Paulo Cortez,et al.  Stock market sentiment lexicon acquisition using microblogging data and statistical measures , 2016, Decis. Support Syst..

[38]  CortezPaulo,et al.  Stock market sentiment lexicon acquisition using microblogging data and statistical measures , 2016 .

[39]  M. Wimmer,et al.  Kwant: a software package for quantum transport , 2013, 1309.2926.

[40]  William R. King Knowledge Management and Organizational Learning , 2009 .

[41]  Hongyun Zhang,et al.  Rough set based hybrid algorithm for text classification , 2009, Expert Syst. Appl..

[42]  JoongHo Ahn,et al.  Helpfulness of Online Consumer Reviews: Readers' Objectives and Review Cues , 2012, Int. J. Electron. Commer..

[43]  R HruschkaEduardo,et al.  Tweet sentiment analysis with classifier ensembles , 2014 .

[44]  Praveen Pathak,et al.  Making words work: Using financial text as a predictor of financial events , 2010, Decis. Support Syst..

[45]  Vlado Keselj,et al.  Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports , 2009, Canadian Conference on AI.

[46]  Paolo Rosso,et al.  Making objective decisions from subjective data: Detecting irony in customer reviews , 2012, Decis. Support Syst..

[47]  Tomer Geva,et al.  Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news , 2014, Decis. Support Syst..

[48]  Anthony J. T. Lee,et al.  Mining perceptual maps from consumer reviews , 2016, Decis. Support Syst..

[49]  Pei-Chann Chang,et al.  Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news , 2013, Knowl. Based Syst..

[50]  Jochen Dörre,et al.  Text mining: finding nuggets in mountains of textual data , 1999, KDD '99.

[51]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[52]  Carla O'Dell,et al.  If Only We Knew What We Know: Identification and Transfer of Internal Best Practices , 1998 .

[53]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[54]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[55]  Ali Balaid,et al.  Knowledge maps: A systematic literature review and directions for future research , 2016, Int. J. Inf. Manag..

[56]  Knut Blind,et al.  Extending the knowledge base of foresight: The contribution of text mining , 2017 .

[57]  Qing Cao,et al.  Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach , 2011, Decis. Support Syst..

[58]  Ling Liu,et al.  Manipulation in digital word-of-mouth: A reality check for book reviews , 2011, Decis. Support Syst..

[59]  Dirk Neumann,et al.  Automated news reading: Stock price prediction based on financial news using context-capturing features , 2013, Decis. Support Syst..

[60]  Yehuda Lindell,et al.  Text Mining at the Term Level , 1998, PKDD.

[61]  Desheng Dash Wu,et al.  Using text mining and sentiment analysis for online forums hotspot detection and forecast , 2010, Decis. Support Syst..

[62]  M. Polanyi The Logic of Tacit Inference , 1966, Philosophy.

[63]  S. Ananiadou,et al.  Using text mining for study identification in systematic reviews: a systematic review of current approaches , 2015, Systematic Reviews.

[64]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[65]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[66]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[67]  Naohiko Uramoto,et al.  A text-mining system for knowledge discovery from biomedical documents , 2004, IBM Syst. J..

[68]  Nan Hu,et al.  Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales , 2014, Decis. Support Syst..

[69]  J. March,et al.  Organizational Learning , 2008 .

[70]  Mark M. Kornfein,et al.  A Comparison of Classification Techniques for Technical Text Passages , 2007, World Congress on Engineering.

[71]  Wesley W. Chu,et al.  Path Knowledge Discovery: Multilevel Text Mining as a Methodology for Phenomics , 2014 .