Data mining in metadata repositories

Metadata are data about data and can refer to a large type of information categories: journals, digital libraries, structured and semistructured documents, etc. Our approach refers mainly to discovery of the association rules problem in metadata repository associated with semistructured documents. Extensions to heterogeneous documents and possible application to unstructured documents are taken into account also. The metadata stored in metadata repositories are processed by translation in a table, similar to the well known basket from association rule discovery problem. A slightly modified Apriori and AprioriAll algorithms are used to discover association rules among values of metadata attributes. Experimental results over a selected collection of metadata stored in an repository is presented.

[1]  Patrick Martin,et al.  Using metadata to query passive data sources , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[2]  Ee-Peng Lim,et al.  DTD-Miner: a tool for mining DTD from XML documents , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[3]  Luis Gravano,et al.  The Stanford Digital Library metadata architecture , 1997, International Journal on Digital Libraries.

[4]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.