Topological Signature of 19th Century Novelists: Persistence Homology in Context-Free Text Mining

Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textural document may reveal some additive information regarding the document that is not reflected in any other features from conventional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textural documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.

[1]  Wlodek Zadrozny,et al.  A Sheaf Model of Contradictions and Disagreements. Preliminary Report and Discussion , 2018, ArXiv.

[2]  Bahareh Rahmanzadeh Heravi,et al.  Topic Detection in Twitter Using Topology Data Analysis , 2015, ICWE Workshops.

[3]  Leonidas J. Guibas,et al.  Persistence Barcodes for Shapes , 2005, Int. J. Shape Model..

[4]  Brittany Terese Fasy,et al.  Introduction to the R package TDA , 2014, ArXiv.

[5]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[6]  Radmila Sazdanovic,et al.  A topological collapse for document summarization , 2016, 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[7]  Jose A. Perea,et al.  Sliding Windows and Persistence: An Application of Topological Methods to Signal Analysis , 2013, Found. Comput. Math..

[8]  Srikumar Venugopal,et al.  A systematic review and comparative analysis of cross-document coreference resolution methods and tools , 2016, Computing.

[9]  Firas A. Khasawneh,et al.  Stability Determination in Turning Using Persistent Homology and Time Series Analysis , 2014 .

[10]  Wlodek Zadrozny,et al.  A Sheaf Model of Contradictions and Disagreements. A (very) Preliminary Report , 2018, ISAIM.

[11]  Afra Zomorodian,et al.  Computational topology , 2010 .

[12]  Yi Zhao,et al.  Persistent topological features of dynamical systems. , 2015, Chaos.

[13]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[14]  Heather A. Harrington,et al.  Persistent homology of time-dependent functional networks constructed from coupled time series. , 2016, Chaos.

[15]  R. Ghrist Barcodes: The persistent topology of data , 2007 .

[16]  Leonidas J. Guibas,et al.  A Barcode Shape Descriptor for Curve Point Cloud Data , 2004, PBG.

[17]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[18]  I-Jen Chiang,et al.  Discover the semantic topology in high-dimensional data , 2007, Expert Syst. Appl..

[19]  Xiaojin Zhu,et al.  Persistent Homology: An Introduction and a New Text Representation for Natural Language Processing , 2013, IJCAI.

[20]  Ananthram Swami,et al.  Simplifying the homology of networks via strong collapses , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  H. Edelsbrunner,et al.  Persistent Homology — a Survey , 2022 .

[22]  Pawel Dlotko,et al.  Computational Topology in Text Mining , 2012, CTIC.

[23]  Herbert Edelsbrunner,et al.  Topological persistence and simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[24]  E. Munch A User's Guide to Topological Data Analysis , 2017, J. Learn. Anal..

[25]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[26]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[27]  Rodrigo Fernandes de Mello,et al.  Persistent homology for time series and spatial data clustering , 2015, Expert Syst. Appl..