A Fast & Memory Efficient Technique for Mining Frequent Item Sets from a Data Set

Frequent/Periodic item set mining is a extensively used data mining method for market based analysis,privacy preserving and it is also a heart favourite theme for the resarchers. A substantial work has been devoted to this research and tremendous progression made in this field so far. Frequent/Periodic itemset mining is used for search and to find back the relationship in a given data set. This paper introduces a new way which is more efficient in time and space frequent itemset mining. Our method scans the database only one time whereas the previous algorithms scans the database many times which utilizes more time and memory related to new one. In this way,the new algorithm will reduced the complexity (time & memory) of frequent pattern mining. We present efficient techniques to implement the new approach. Keywords: Incremental Association Rule Mining, Minimum Support Threshold(MST),Transactional Data set. I. Introduction Data mining is the process of discovering and analyzing useful data from a large data set. The goal of the data mining process is to extract the useful information from a data set and transform it into an understandable structure for further use. It allows the user to analyze the data from various dimensions, categorize it and summarize the relationships identify. Data mining has emerged in various areas such as Customer relationship management (identify those who are likely to leave for a competitor), Banking (loan/credit card approval predict good customers based on old customers), Targeted marketing (identify likely responders to promotions), Fraud detection (telecommunications, financial transactions) etc. Data mining is the key part of Knowledge Discovery in Database (KDD)(1) (4) process. Data selection, data cleaning, data transformation, Data mining, finding presentation, finding interpretation, and finding evaluation are the steps involve in KDD process. There are different kinds of method and techniques for data mining. Tasks in data mining can be classified as Summarization (relevant data is summarized and abstracted, resulting a smaller set which gives a overview of a data and usually with complete information) , Classification ( it determines the class of an object based on its attributes), Clustering (identification of classes), Trend analysis, Regression and Deviation (Predictive mining), Association Rule Discovery(1) (2), Sequential Pattern Discovery (Descriptive mining). Data mining adopted its techniques from various research areas, including Statistical approach ( Bayesian network), Machine learning, database systems, neural networks, rough sets, and visualizations. Predictive mining is the technique which is used to predict the unknown variables or future values of other variable and Descriptive mining is technique which is used to find the human-interpretable patterns that describes the data. One of the major technique in data mining are Association rules. The most important task in association rule mining is to find the frequent/periodic patterns, associations, correlations, or casual structures among sets of items or objects in transaction or relational databases, and other information repositories (13). In a given set of transactions, where transaction consists of items such as P and R then association rules are denoted as P=>R and intersection between them is null. The association rule can be useful for commodity management, marketing, etc. The support of this rule is defined by percentage of transaction that contains set P. And the Confidence of this rule is defined as percentage of these P transactions that also contain R. In Association rule mining, Frequent item set is an item set whose support is greater than the Minimum Support Threshold (MST). Minimum support threshold is a user defined support which is used to generate frequent items. Previously algorithms which are used to discover frequent patterns are static in nature. These algorithms are not able to work efficiently whenever any change happens to original database as in real world data is growing continuously. One solution of this algorithm is to reapply the algorithm on new database but in this case CPU utilization and time is very high and this approach is costly whenever small amount of data is inserted. Efficiency of these algorithms is based on number of passes as well as scans required for processing. A new algorithm was introduced to discover frequent items whenever new data is added dynamically to the original database. This algorithm was based on Generate and Test Method. In this method all possible candidates are generated and then tested for minimum support threshold (MST).