Advanced Dimensionality Reduction Method for Big Data

The growing glut of data in the worlds of science, business and government create an urgent need for consideration of big data. Big data is a term that describes large volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information. Big data challenge is becoming one of the most exciting opportunities for the next years. Data mining algorithms like association rule mining perform an exhaustive search to find all rules satisfying some constraints. it is clear that it is difficult to identify the most effective rule from big data. A novel method for feature selection and extraction has been introduced for big data using genetic algorithm. Dimensionality reduction can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, to obtain the accuracy and saves the computation time and simplifies the result. A genetic algorithm was developed based approach utilizing a feedback linkage between feature selection and association rule using MapReduce for big data.

[1]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[2]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[3]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[4]  Martijn Spitters,et al.  Comparing feature sets for learning text categorization , 2000, RIAO.

[5]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[6]  Ashwin Ram,et al.  Efficient Feature Selection in Conceptual Clustering , 1997, ICML.

[7]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[8]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Gerson Zaverucha,et al.  Genetic Based Machine Learning: Merging Pittsburgh and Michigan, an Implicit Feature Selection Mechanism and a New Crossover Operator , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Le Gruenwald,et al.  A survey of data mining and knowledge discovery software tools , 1999, SKDD.

[13]  Chengqi Zhang,et al.  ARMGA: IDENTIFYING INTERESTING ASSOCIATION RULES WITH GENETIC ALGORITHMS , 2005, Appl. Artif. Intell..