A Novel Integrated Approach for Companion Vehicle Discovery Based on Frequent Itemset Mining on Spark

Companion vehicle discovery received much attention from the research community. It has been widely adopted by traffic management departments in many aspects such as the involved vehicle tracking. Since there are a massive amount of traffic data that have complex and inaccurate accompanying vehicle relationships, companion vehicle discovery has become a challenge yet hot research topic. Several algorithms have been proposed to solve this issue on transactional datasets some of which are based on the frequent item mining algorithms that are used to extract knowledge from data in several real-world applications, such as market basket analysis, crime detection/prevention, and crowd mining. However, most of those algorithms mostly fail on large-scale datasets since it needs to scan the datasets iteratively for several times, which makes them unfeasible and time-consuming while dealing with big data. To this end, we proposed a novel HD-FIM algorithm to extract the companion vehicles from a massive amount of traffic data with the best execution efficiency on spark platform. It works in a hybrid approach between depth first and breadth first to handle the big data in distributed clusters. Experiment results show that the proposed algorithm, HD-FIM, outperforms the existing typical frequent itemset mining algorithms through practical vehicle set extraction calculations and it can be applied in any applicable traffic big data.

[1]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[2]  Chen Liu,et al.  Instant Traveling Companion Discovery Based on Traffic-Monitoring Streaming Data , 2016, IEEE WISA.

[3]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[4]  Jeff Heaton,et al.  Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms , 2016, SoutheastCon 2016.

[5]  Raouf Boutaba,et al.  An Analytical Model for Estimating Cloud Resources of Elastic Services , 2015, Journal of Network and Systems Management.

[6]  Beng Chin Ooi,et al.  Continuous Clustering of Moving Objects , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jian Guo,et al.  Research on Improved A Priori Algorithm Based on Coding and MapReduce , 2013, 2013 10th Web Information System and Application Conference.

[8]  Ling Li,et al.  Distributed data mining: a survey , 2012, Inf. Technol. Manag..

[9]  Magda B. Fayek,et al.  Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique , 2017, Data Sci. J..

[10]  Bora Uçar,et al.  Parallel Frequent Item Set Mining with Selective Item Replication , 2011, IEEE Transactions on Parallel and Distributed Systems.

[11]  Jiawei Han,et al.  Swarm: Mining Relaxed Temporal Moving Object Clusters , 2010, Proc. VLDB Endow..

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[14]  Heng Tao Shen,et al.  Convoy Queries in Spatio-Temporal Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Patrick Laube,et al.  Analyzing Relative Motion within Groups of Trackable Moving Point Objects , 2002, GIScience.

[16]  Khaled Salah,et al.  Impact of CPU Utilization Thresholds and Scaling Size on Autoscaling Cloud Resources , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[17]  Yanbo Han,et al.  Instant Discovery of Moment Companion Vehicles from Big Streaming Traffic Data , 2015, 2015 International Conference on Cloud Computing and Big Data (CCBD).

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[19]  Bart Goethals,et al.  Frequent Itemset Mining for Big Data , 2013, 2013 IEEE International Conference on Big Data.

[20]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[21]  Wei-keng Liao,et al.  Parallel Data Mining Algorithms for Association Rules and Clustering , 2007 .

[22]  Min Zhang,et al.  The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[23]  Rong Gu,et al.  YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[24]  Manohar Kaul,et al.  R-Apriori: An Efficient Apriori based Algorithm on Spark , 2015, PIKM@CIKM.

[25]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[26]  Mustansar Ali Ghazanfar,et al.  Frequent Pattern Mining Algorithms for Finding Associated Frequent Patterns for Data Streams: A Survey , 2014, EUSPN/ICTH.

[27]  Chun-Cheng Lin,et al.  A fast and distributed algorithm for mining frequent patterns in congested networks , 2015, Computing.

[28]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[29]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[30]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[31]  Klemens Böhm,et al.  4S: Scalable subspace search scheme overcoming traditional Apriori processing , 2013, 2013 IEEE International Conference on Big Data.

[32]  Sanjay Rathee,et al.  Adaptive-Miner: an efficient distributed association rule mining algorithm on Spark , 2018, Journal of Big Data.

[33]  Jianhua Fan,et al.  An overview of data mining and knowledge discovery , 1998, Journal of Computer Science and Technology.

[34]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.