Parallel Pre-processing for XML mining using Graphic Processor

With the advent of ever increasing data availability on internet, mining and converting the information into knowledge is becoming extremely important and challenging task for researchers in data mining community. Mining of association rules is considered as an important research direction of data mining. XML is being extensively and pre-dominantly used as a markup language on web and thus makes it an interesting source for data extraction from large data sets. There is a growing demand for modern tools and technologies which can efficiently handle such large data. This paper proposes a collaborative approach to extract association rules from structured XML data with the help of cost effective, easily affordable and energy efficient Graphic Processors. Parallelism is applied at two levels in our proposed framework. First the deserialization of XML data is done using a parallel approach. Secondly the in-built multithreaded structure of GPU sorts the converted XML data in the pre-processing stage to make the dataset favorable for mining. Using a parallel framework in form of inbuilt hardware based GPU; we try to handle the scalability issue upto a large extent.

[1]  Wolfgang Lehner,et al.  Scalable frequent itemset mining on many-core processors , 2013, DaMoN '13.

[2]  T. Amudha,et al.  An Improved Association Rule Mining Technique for Xml Data Using Xquery and Apriori Algorithm , 2009, 2009 IEEE International Advance Computing Conference.

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  Mohammad Saniee Abadeh,et al.  A New Dynamic Distributed Algorithm for Frequent Itemsets Mining , 2013 .

[5]  Zhi-gang Wang,et al.  A Parallel Association-Rule Mining Algorithm , 2012, WISM.

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  Sujni Paul,et al.  An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining with XML Data for Improved Response Time , 2010 .

[8]  Jixue Liu,et al.  On mining association rules with semantic constraints in XML , 2011, 2011 Sixth International Conference on Digital Information Management.

[9]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[10]  Samuel N. Kamin,et al.  CoDeSe: fast deserialization via code generation , 2011, ISSTA '11.

[11]  Bora Uçar,et al.  Parallel Frequent Item Set Mining with Selective Item Replication , 2011, IEEE Transactions on Parallel and Distributed Systems.

[12]  Abraham Heifets,et al.  XML screamer: an integrated approach to high performance XML parsing, validation and deserialization , 2006, WWW '06.

[13]  J.-M. Le Goff,et al.  Object Serialization and Deserialization Using XML , 2001 .

[14]  Alessandro Campi,et al.  Mining Association Rules from XML Data , 2002, DaWaK.

[15]  Wim Martens,et al.  Querying graph databases with XPath , 2013, ICDT '13.