Unsupervised and Supervised Learning of Graph Domains

In this chapter, we will describe a method for extracting an underlying graph structure from an unstructured text document. The resulting graph structure is a symmetrical un-directed graph. An unsupervised learning approach is applied to cluster a given text corpus into groups of similar structured graphs. Moreover, if labels are given to some of the documents in the text corpus, a supervised learning approach can be applied to learn the underlying input-output mapping between the symmetrical un-directed graph structures and a real-valued vector. The approach will be illustrated using a standard benchmark problem in text processing, viz., a subset of the Reuters text corpus. Some observations and further research directions are given.

[1]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[2]  Xiaohua Hu,et al.  A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method , 2007, BMC Bioinformatics.

[3]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[4]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[5]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[9]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[10]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[11]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[12]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[13]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[17]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[18]  Ah Chung Tsoi,et al.  A ConceptLink Graph for Text Structure Mining , 2009, ACSC.

[19]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[20]  Ah Chung Tsoi,et al.  Self-Organizing Maps for cyclic and unbounded graphs , 2008, ESANN.

[21]  Ah Chung Tsoi,et al.  Projection of undirected and non-positional graphs using Self Organizing Maps , 2009, ESANN.

[22]  Alessio Micheli,et al.  A general framework for unsupervised processing of structured data , 2004, Neurocomputing.

[23]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.