Data driven encoding of structures and link predictions in large Xml document collections

In recent years there have been some significant research towards the ability of processing related data, particularly the relatedness among atomic elements in a structure with those in another structure. A number of approaches have been developed with various degrees of success. This chapter provides an overview of machine learning approaches for the encoding of related atomic elements in one structure with those in other structures. The chapter briefly reviews a number of unsupervised approaches for such data structures which can be used for solving generic classification, regression, and clustering problems. We will apply this approach to a particularly interesting and challenging problem: The prediction of both the number and their locations of the in-links and out-links of a set of XML documents. In this problem, we are given a set of XML pages, which may represent web pages on the Internet, with in-links and out-links. Based on this training dataset, we wish to predict the number and locations of in-links and out-links of a set of XML documents, which are as yet not linked to other existing XML documents. To the best of our knowledge, this is the only known data driven unsupervised machine learning approach for the prediction of in-links and out-links of XML documents.

[1]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[2]  Ah Chung Tsoi,et al.  Self Organizing Maps for the Clustering of Large Sets of Labeled Graphs , 2008, INEX.

[3]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[4]  Alessio Micheli,et al.  Quantitative structure-activity relationships of Benzodiazepines by recursive cascade correlation , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  Alessio Micheli,et al.  A general framework for unsupervised processing of structured data , 2004, Neurocomputing.

[6]  Ah Chung Tsoi,et al.  A Supervised Self-Organizing Map for Structured Data , 2001, WSOM.

[7]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[8]  Alfred V. Aho,et al.  Foundations of Computer Science , 1979, Lecture Notes in Computer Science.

[9]  Markus Hagenbuchner,et al.  Extensions and evaluations of adaptive processing of structured information using artifical neural networks , 2002 .

[10]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[11]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[12]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[13]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[14]  Ah Chung Tsoi,et al.  Computational Capabilities of Graph Neural Networks , 2009, IEEE Transactions on Neural Networks.

[15]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[16]  Ah Chung Tsoi,et al.  A Machine Learning Approach to Link Prediction for Interlinked Documents , 2009, INEX.

[17]  Ah Chung Tsoi,et al.  Web Spam Detection by Probability Mapping GraphSOMs and Graph Neural Networks , 2010, ICANN.

[18]  D. Rumelhart Parallel Distributed Processing Volume 1: Foundations , 1987 .

[19]  Ah Chung Tsoi,et al.  Supervised Encoding of Graph-of-Graphs for Classification and Regression Problems , 2009, INEX.

[20]  Ah Chung Tsoi,et al.  XML Document Mining Using Contextual Self-organizing Maps for Structures , 2006, INEX.

[21]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[22]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[23]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[24]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..

[25]  Barbara Hammer,et al.  Neural networks can approximate mappings on structured objects , 1997 .

[26]  Ah Chung Tsoi,et al.  Using attributed plex grammars for the generation of image and graph databases , 2003, Pattern Recognit. Lett..

[27]  Maarten de Rijke,et al.  An Exploration of Learning to Link with Wikipedia: Features, Methods and Training Collection , 2009, INEX.

[28]  Teuvo Kohonen,et al.  Exploration of very large databases by self-organizing maps , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[29]  Ah Chung Tsoi,et al.  Projection of undirected and non-positional graphs using Self Organizing Maps , 2009, ESANN.

[30]  Hujun Yin,et al.  On the Distribution and Convergence of Feature Space in Self-Organizing Maps , 1995, Neural Computation.

[31]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[32]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[33]  Manuel Graña,et al.  A Sensitivity Analysis of the Self Organizing Maps as an Adaptive One-pass Non-stationary Clustering Algorithm: the Case of Color Quantization of Image Sequences , 1997, Neural Processing Letters.

[34]  Jordan B. Pollack,et al.  Implications of Recursive Distributed Representations , 1988, NIPS.

[35]  Charles L. A. Clarke,et al.  University of Waterloo at INEX 2009: Ad Hoc, Book, Entity Ranking, and Link-the-Wiki Tracks , 2009, INEX.

[36]  Ah Chung Tsoi,et al.  Clustering XML Documents Using Self-organizing Maps for Structures , 2005, INEX.

[37]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[38]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[39]  Horst Bunke STRING GRAMMARS FOR SYNTACTIC PATTERN RECOGNITION , 1990 .

[40]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[41]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[42]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[43]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.