Ranking Tagged Resources Using Social Semantic Relevance

XML has become the standard way for representing and transforming data over the World Wide Web. The problem with XML documents is that they have a very high ratio of redundancy, which makes these documents demanding a large storage capacity and large network band-width for transmission. This study designs a system for compressing and querying XML documents XMLCQ which compresses the XML document without the need to its schema or DTD to minimize the amount of technologies associated with these documents. XMLCQ first compressed the XML document by separating its data into containers according to the path of these data from the root to the leaf, then it compressed these containers using a back-end compression technique. The compressed file then could be retrieved with any kind of queries applied. Only the required information is decompressed and submitted to the user. Depending on several experiments, the query processor part of the system showed the ability to answer different kinds of queries ranging from simple exact match queries to complex ones. Furthermore, this paper introduced the idea of retrieving information from more than one compressed XML documents.

[1]  Hema Banati,et al.  FCHC: A Social Semantic Focused Crawler , 2011, ACC.

[2]  Christopher League,et al.  Type-Based Compression of XML Data , 2007, 2007 Data Compression Conference (DCC'07).

[3]  Gonzalo Navarro,et al.  Fast in-memory XPath search using compressed indexes , 2010, ICDE.

[4]  Masatoshi Yoshikawa,et al.  Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages , 2003, HYPERTEXT '03.

[5]  S. Chawla,et al.  Query expansion using information scent , 2008, 2008 International Symposium on Information Technology.

[6]  J. Clark,et al.  RELAX NG specification , 2001 .

[7]  Wolfgang Nejdl,et al.  The Benefit of Using Tag-Based Profiles , 2007 .

[8]  Hema Banati,et al.  Social Semantic Retrieval and Ranking of eResources , 2010, 2010 International Conference on Advances in Recent Technologies in Communication and Computing.

[9]  David Ellis,et al.  The Dilemma of Measurement in Information Retrieval Research , 1996, J. Am. Soc. Inf. Sci..

[10]  Joan Lu,et al.  Vague Content and Structure (VCAS) Retrieval for XML Electronic Healthcare Records (EHR) , 2009, International Conference on Internet Computing.

[11]  Neel Sundaresan,et al.  Millau: an encoding format for efficient representation and exchange of XML over the Web , 2000, Comput. Networks.

[12]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[13]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[14]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[15]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[16]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[17]  Hema Banati,et al.  Use of Ontology for Reusing Web Repositories for eLearning , 2010 .

[18]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[19]  Wilfred Ng,et al.  XQzip: Querying Compressed XML Using Structural Indexing , 2004, EDBT.

[20]  Ed H. Chi,et al.  Understanding the efficiency of social tagging systems using information theory , 2008, ICWSM.

[21]  Marek Hatala,et al.  The Social Semantic Web in Intelligent Learning Environments: state of the art and future challenges , 2009, Interact. Learn. Environ..

[22]  Christoph Meinel,et al.  SPEAR: SPAMMING‐RESISTANT EXPERTISE ANALYSIS AND RANKING IN COLLABORATIVE TAGGING SYSTEMS , 2011, Comput. Intell..

[23]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[24]  Emre Velipasaoglu,et al.  Identifying primary content from web pages and its application to web search ranking , 2011, WWW.

[25]  Massimo Franceschet,et al.  XPathMark: Functional and Performance Tests for XPath , 2006, XQuery Implementation Paradigms.

[26]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[27]  Cong Yu,et al.  TIMBER: a native system for querying XML , 2003, SIGMOD '03.

[28]  Massimo Franceschet XPathMark: An XPath Benchmark for the XMark Generated Data , 2005, XSym.

[29]  Ian H. Witten,et al.  Arithmetic coding revisited , 1998, TOIS.

[30]  Liwen Vaughan,et al.  New measurements for search engine evaluation proposed and tested , 2004, Inf. Process. Manag..

[31]  Yi Liu,et al.  Clustering queries for better document ranking , 2009, CIKM.

[32]  Wolfgang Nejdl,et al.  Can all tags be used for search? , 2008, CIKM '08.

[33]  Barry E. Mullins,et al.  An analysis of XML compression efficiency , 2007, ExpCS '07.

[34]  Sherif Sakr,et al.  XML compression techniques: A survey and comparison , 2009, J. Comput. Syst. Sci..

[35]  Fabrizio Luccio,et al.  Compressing and searching XML data via two zips , 2006, WWW '06.

[36]  Dan Suciu,et al.  XMill: an efficient compressor for XML data , 2000, SIGMOD 2000.

[37]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[38]  Jayant R. Haritsa,et al.  XGrind: a query-friendly XML compressor , 2002, Proceedings 18th International Conference on Data Engineering.

[39]  Komal Kumar Bhatia,et al.  International Journal of Information Retrieval Research , 2011 .

[40]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[41]  Alistair Moffat,et al.  Implementing the PPM data compression scheme , 1990, IEEE Trans. Commun..

[42]  Chin-Wan Chung,et al.  XPRESS: a queriable compression for XML data , 2003, SIGMOD '03.

[43]  Jaana Kekäläinen,et al.  Using graded relevance assessments in IR evaluation , 2002, J. Assoc. Inf. Sci. Technol..

[44]  Christopher League,et al.  Schema-Based Compression of XML Data with Relax NG , 2007, J. Comput..

[45]  Angela Bonifati,et al.  XML Lossy Text Compression: A Preliminary Study , 2009, XSym.

[46]  Ioana Manolescu,et al.  XQueC: A query-conscious compressed XML database , 2007, TOIT.

[47]  David J. DeWitt,et al.  Mixed Mode XML Query Processing , 2003, VLDB.

[48]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[49]  Sherif Sakr,et al.  XML Tree Structure Compression , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[50]  Hema Banati,et al.  Architecture to Organize Social Semantic Relevant Web Resources in a Knowledgebase , 2011 .

[51]  John Riedl,et al.  Can people collaborate to improve the relevance of search results? , 2008, RecSys '08.