Semantic-based Merging of RSS Items

Merging XML documents can be of key importance in several applications. For instance, merging the RSS news from same or different sources and providers can be beneficial for end-users in various scenarios. In this paper, we address this issue and explore the relatedness measure between RSS elements. We show here how to define and compute exclusive relations between any two elements and provide several predefined merging operators that can be extended and adapted to human needs. We also provide a set of experiments conducted to validate our approach.

[1]  Hirofumi Katsuno,et al.  On the Difference between Updating a Knowledge Base and Revising It , 1991, KR.

[2]  Elio Masciari,et al.  Fast detection of XML structural similarity , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Hans-Jörg Schek,et al.  Generating Vector Spaces On-the-fly for Flexible XML Retrieval , 2002 .

[5]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[6]  Carlos Alberto Heuser,et al.  Matching XML documents in highly dynamic applications , 2008, DocEng '08.

[7]  Paul A. Gore,et al.  11 – Cluster Analysis , 2000 .

[8]  Sudarshan S. Chawathe,et al.  Comparing Hierarchical Data in External Memory , 1999, VLDB.

[9]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[10]  Benjamin King Step-Wise Clustering Procedures , 1967 .

[11]  Weiru Liu,et al.  Fusion rules for merging uncertain information , 2006, Inf. Fusion.

[12]  Sébastien Konieczny,et al.  Merging with Integrity Constraints , 1999, ESCQARU.

[13]  Maria Soledad Pera,et al.  Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information , 2008, ICCSA.

[14]  Alan F. Smeaton,et al.  Using WordNet in a Knowledge-Based Approach to Information Retrieval , 1995 .

[15]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[16]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[17]  Alexandra Poulovassilis,et al.  A General Formal Framework for Schema Transformation , 1998, Data Knowl. Eng..

[18]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[19]  Richard Chbeir,et al.  Towards Efficient Horizontal Multimedia Database Fragmentation using Semantic-based Predicates Implication , 2007, SBBD.

[20]  Hector Garcia-Molina,et al.  Template-based wrappers in the TSIMMIS system , 1997, SIGMOD '97.

[21]  Maria Soledad Pera,et al.  Finding Similar RSS News Articles Using Correlation-Based Phrase Matching , 2007, KSEM.

[22]  Tancred Lindholm,et al.  A three-way merge for XML documents , 2004, DocEng '04.

[23]  Anthony Hunter,et al.  Propositional Fusion Rules , 2003, ECSQARU.

[24]  Yiu-Kai Ng,et al.  Using Word Clusters to Detect Similar Web Documents , 2006, KSEM.

[25]  M. Aldenderfer Cluster Analysis , 1984 .

[26]  William W. Cohen A Web-based information system that reasons with structured collections of text , 1998, AGENTS '98.

[27]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[28]  Xin Li,et al.  A novel clustering-based RSS aggregator , 2007, WWW '07.

[29]  Timos K. Sellis,et al.  A methodology for clustering XML documents by structure , 2006, Inf. Syst..

[30]  Wilfred Ng,et al.  A Unifying Framework for Merging and Evaluating XML Information , 2005, DASFAA.

[31]  Anthony Hunter,et al.  A knowledge-based approach to merging information , 2006, Knowl. Based Syst..

[32]  Robin La Fontaine,et al.  Merging XML files: a new approach providing intelligent merge of XML data sets , 2002 .

[33]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[34]  H. V. Jagadish,et al.  Evaluating Structural Similarity in XML Documents , 2002, WebDB.

[35]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[36]  Tancred Lindholm XML three-way merge as a reconciliation engine for mobile data , 2003, MobiDe '03.

[37]  Weiru Liu,et al.  Merging uncertain information with semantic heterogeneity in XML , 2006, Knowledge and Information Systems.

[38]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[39]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[40]  Sébastien Konieczny,et al.  On the Logic of Merging , 1998, KR.

[41]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[42]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[43]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[44]  Alberto O. Mendelzon,et al.  Tableau Techniques for Querying Information Sources through Global Schemas , 1999, ICDT.

[45]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[46]  Richard Chbeir,et al.  Relating RSS News/Items , 2009, ICWE.

[47]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[48]  Yiu-Kai Ng,et al.  Eliminating Redundant and Less-Informative RSS News Articles Based on Word Similarity and a Fuzzy Equivalence Relation , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[49]  Ben Hammersley Content Syndication with RSS , 2003 .

[50]  Anthony Hunter,et al.  Fusion rules for context-dependent aggregation of structured news reports , 2004, J. Appl. Non Class. Logics.

[51]  R. Sokal,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification. , 1975 .

[52]  Christoph Quix,et al.  Generic Schema Merging , 2007, CAiSE.

[53]  Eugene W. Myers,et al.  An O(NP) Sequence Comparison Algorithm , 1990, Inf. Process. Lett..

[54]  Richard Chbeir,et al.  A Hybrid Approach for XML Similarity , 2007, SOFSEM.

[55]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..