Text structuring methods based on complex network: a systematic review

Currently, there is a large amount of text being shared through the Internet. These texts are available in different forms—structured, unstructured and semi structured. There are different ways of analyzing texts, but domain experts usually divide this process in some steps such as pre-processing, feature extraction and a final step that could be classification, clustering, summarization, and keyword extraction, depending on the purpose over the text. For this processing, several approaches have been proposed in the literature based on variations of methods like artificial neural network and deep learning. In this paper, we conducted a systematic review of papers dealing with the use of complex networks approaches for the process of analyzing text. The main results showed that complex network topological properties, measures and modeling can capture and identify text structures concerning different purposes such as text analysis, classification, topic and keyword extraction, and summarization. We conclude that complex network topological properties provide promising strategies with respect of processing texts, considering their different aspects and structures.

[1]  Prabin Kumar Panigrahi,et al.  A review of recent advances in text mining of Indian languages , 2016, BIS 2016.

[2]  Carolin Müller-Spitzer,et al.  The statistical trade-off between word order and word structure – Large-scale evidence for the principle of least effort , 2016, PloS one.

[3]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[4]  Gang Wang,et al.  Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment , 2019, Int. J. Ambient Comput. Intell..

[5]  Alneu de Andrade Lopes,et al.  Word sense disambiguation: A complex network approach , 2018, Inf. Sci..

[6]  Khalid Ahmed Almutawah A decision support system for academic advisors , 2014, Int. J. Bus. Inf. Syst..

[7]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[8]  Yue Wang,et al.  Evolutionary features of academic articles co-keyword network and keywords co-occurrence network: Based on two-mode affiliation network , 2016 .

[9]  Luciano da Fontoura Costa,et al.  Opinion Discrimination Using Complex Network Features , 2010, CompleNet.

[10]  Maria Bardosova,et al.  Using network science and text analytics to produce surveys in a scientific topic , 2015, J. Informetrics.

[11]  Diego R. Amancio,et al.  Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks , 2016, PloS one.

[12]  Guilherme Alberto Wachs-Lopes,et al.  Analyzing natural human language from the point of view of dynamic of a complex network , 2016, Expert Syst. Appl..

[13]  Keping Li,et al.  A Network Method for Identifying the Root Cause of High-Speed Rail Faults Based on Text Data , 2019 .

[14]  Amirreza Shirani,et al.  A Supervised Approach for Automatic Web Documents Topic Extraction Using Well-Known Web Design Features , 2016 .

[15]  Basavaraj S. Anami,et al.  Machine Learning Techniques in Web Content Mining: A Comparative Analysis , 2014, J. Inf. Knowl. Manag..

[16]  Olaf Sporns,et al.  Complex network measures of brain connectivity: Uses and interpretations , 2010, NeuroImage.

[17]  Hak J. Kim,et al.  Big data: web-crawling and analysing financial news using RapidMiner , 2015, Int. J. Bus. Inf. Syst..

[18]  J. Suh SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques , 2019, Sustainability.

[19]  Diego R. Amancio,et al.  Probing the Topological Properties of Complex Networks Modeling Short Written Texts , 2014, PloS one.

[20]  Ana Mestrovic,et al.  Multilayer Network of Language: a Unified Framework for Structural Analysis of Linguistic Subsystems , 2015, ArXiv.

[21]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[22]  Lin Zhu,et al.  Complex dynamics of text analysis , 2014 .

[23]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[24]  Ljupco Todorovski,et al.  The Influence of Feature Representation of Text on the Performance of Document Classification , 2017, Applied Sciences.

[25]  Graeme Hirst,et al.  Labelled network motifs reveal stylistic subtleties in written texts , 2017, J. Complex Networks.

[26]  G. J. Rodgers,et al.  Differences between Normal and Shuffled Texts: Structural Properties of Weighted Networks , 2008, Adv. Complex Syst..

[27]  Dan Zhang,et al.  Network Analysis of Actors and Policy Keywords for Sustainable Environmental Governance: Focusing on Chinese Environmental Policy , 2019, Sustainability.

[28]  Christina Lioma,et al.  Graph-based term weighting for information retrieval , 2011, Information Retrieval.

[29]  Keping Li,et al.  A new network model for extracting text keywords , 2018, Scientometrics.

[30]  Diego R. Amancio,et al.  Extractive Multi-document Summarization Using Multilayer Networks , 2017, Physica A: Statistical Mechanics and its Applications.

[31]  Georgios Paliouras,et al.  Graph vs. bag representation models for the topic classification of web documents , 2016, World Wide Web.

[32]  Matteo Magnani,et al.  Foundations of Temporal Text Networks , 2018, Applied Network Science.

[33]  Luciano da Fontoura Costa,et al.  Complex networks analysis of language complexity , 2012, ArXiv.

[34]  Vasudha Bhatnagar,et al.  Complex Network based Supervised Keyword Extractor , 2019, Expert Syst. Appl..

[35]  Chao Zhao,et al.  Clinical-decision support based on medical literature: A complex network approach , 2016 .

[36]  Diego R. Amancio,et al.  Word sense disambiguation via high order of learning in complex networks , 2012, ArXiv.

[37]  Huajiao Li,et al.  The rapid bi-level exploration on the evolution of regional solar energy development , 2017 .

[38]  Rada Mihalcea,et al.  Random Walk Term Weighting for Improved Text Classification , 2007, Int. J. Semantic Comput..

[39]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[40]  Luciano da Fontoura Costa,et al.  Mesoscopic representation of texts as complex networks , 2016, ArXiv.

[41]  Sanaa A. Alwidian,et al.  Text data mining: a proposed framework and future perspectives , 2015, Int. J. Bus. Inf. Syst..

[42]  Noureddine Zerhouni,et al.  A Data-Driven Failure Prognostics Method Based on Mixture of Gaussians Hidden Markov Models , 2012, IEEE Transactions on Reliability.

[43]  Luciano da Fontoura Costa,et al.  Paragraph-based representation of texts: A complex networks approach , 2019, Inf. Process. Manag..

[44]  Benjamin C. M. Fung,et al.  A unified data mining solution for authorship analysis in anonymous textual communications , 2013, Inf. Sci..

[45]  Luciano da Fontoura Costa,et al.  Supplementary Information-Identification of Literary Movements Using Complex Networks to Represent Texts , 2012 .

[46]  Siew Ann Cheong,et al.  Functional shortcuts in language co-occurrence networks , 2018, PloS one.

[47]  Beatrice M. Ombuki-Berman,et al.  A meta-analysis of centrality measures for comparing and generating complex network models , 2016, J. Comput. Sci..

[48]  Qinke Peng,et al.  Predicting Social Emotions from Readers’ Perspective , 2019, IEEE Transactions on Affective Computing.

[49]  Keping Li,et al.  Correlation analysis of short text based on network model , 2019, Physica A: Statistical Mechanics and its Applications.

[50]  Mohsen Rouached,et al.  A capabilities driven model for web services description and composition , 2016, Int. J. Bus. Inf. Syst..

[51]  Takashi Tahara,et al.  The networks from medical knowledge and clinical practice have small-world, scale-free, and hierarchical features , 2013 .