论文信息 - Data Mining and Data Pre-processing for Big Data

Data Mining and Data Pre-processing for Big Data

Big Data is a term which is used to describe massive amount of data generating from digital sources or the internet usually characterized by 3 V's i.e. Volume, Velocity and Variety. From the past few years data is exponentially growing due to the use of connected devices such as smart phone's, tablets, laptops and desktop computer. Moreover E-commerce which is also known as online market, internet services and social networking sites are generating tremendous user data in the form of documents, emails and web pages. This generated data volume is so vast and overwhelming which makes complex to process and analyze using traditional software systems consuming more time. This paper presents a pre-processing algorithm to extract real time user accessed data from windows operating system environment and an approach from Apache's Hadoop Distributed File System (HDFS) framework using Map Reduce functionality to mine and analyze this large dataset. The ability to mine and analyze Big Data gives organization richer and deeper insights into business patterns and trends. The performance metrics of the proposed system can be evaluated on the basis of execution time, data heterogeneity, scalability, flexibility and mining algorithm used.

Shamsuddin S. Khan | Kavita Sonawane | Ashish R. Jagdale

[1] Wei Fan,et al. Mining big data: current status, and forecast to the future , 2013, SKDD.

[2] Albert Bifet,et al. Mining Big Data in Real Time , 2013, Informatica.

[3] Din J. Wasem,et al. Mining of Massive Datasets , 2014 .

[4] Alexandros Labrinidis,et al. Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[5] John David Miller,et al. Mining Big Data in the Enterprise for Better Business Intelligence , 2012 .

[6] Jimmy J. Lin,et al. Scaling big data mining infrastructure: the twitter experience , 2013, SKDD.

[7] Laurent Brisson,et al. How to Semantically Enhance a Data Mining Process? , 2008, ICEIS.

[8] Francis X. Diebold,et al. On the Origin(s) and Development of the Term 'Big Data' , 2012 .