MapReduce implementation of an improved XML keyword search algorithm

Extensible Markup Language (XML) is commonly employed to represent and transmit information over the Internet. Therefore, how to effectively search for keywords of massive XML data becomes a new issue. In this paper, we first present four properties to improve the classical ILE algorithm. Then, a kind of parallel XML keyword search algorithm, based on intelligent grouping to calculate SLCA, is proposed and realized under MapReduce programming model. At last, a series of experiments are implemented on 7 datasets of different sizes. The obtained results indicate that the proposed algorithm has high execution efficiency and is applicable to keyword search of massive XML data

[1]  Yong Zhang,et al.  An optimization model of Hadoop cluster performance prediction based on Markov process , 2016, Comput. Syst. Sci. Eng..

[2]  Quanlin Li,et al.  MapReduce Implementation of XML Keyword Search Algorithm , 2015, 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity).

[3]  Yue Zhao,et al.  Keyword Search over Probabilistic XML Documents Based on Node Classification , 2015 .

[4]  Ioana Manolescu,et al.  PAXQuery: Efficient Parallel Processing of Complex XQuery , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Mohamed Mezghiche,et al.  A distance based approach for link analysis in XML information retrieval , 2015, Comput. Syst. Sci. Eng..

[6]  Timos K. Sellis,et al.  Top-k-size keyword search on tree structured data , 2015, Inf. Syst..

[7]  Ya-Hui Chang,et al.  Locating Valid SLCAs for XML Keyword Search with NOT Semantics , 2014, SGMD.

[8]  Stefan Böttcher,et al.  Efficient XML Keyword Search Based on DAG-Compression , 2013, DEXA.

[9]  Jianxin Li,et al.  Quasi-SLCA Based Keyword Query Processing over Probabilistic XML Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Dario Colazzo,et al.  Processing XML queries and updates on map/reduce clusters , 2013, EDBT '13.

[11]  Zhenfang Li,et al.  A XML Keyword Search Algorithm Based on MapReduce , 2012 .

[12]  Xudong Lin,et al.  Fast SLCA and ELCA Computation for XML Keyword Queries Based on Set Intersection , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Ling Feng,et al.  Evaluation of XPath queries with preducates: an Eulerian cycle theory based sequencing approach , 2011, Comput. Syst. Sci. Eng..

[14]  Tok Wang Ling,et al.  Towards an Effective XML Keyword Search , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15]  Aoying Zhou,et al.  Distributed SLCA-Based XML Keyword Search by Map-Reduce , 2010, DASFAA Workshops.

[16]  Yannis Papakonstantinou,et al.  Supporting top-K keyword search in XML databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[17]  Y. Papakonstantinou,et al.  Efficient LCA based keyword search in XML data , 2008, EDBT '08.

[18]  Jianyong Wang,et al.  Effective keyword search for valuable lcas over xml documents , 2007, CIKM '07.

[19]  Yannis Papakonstantinou,et al.  Efficient LCA based keyword search in xml data , 2007, CIKM '07.

[20]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[21]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[22]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.