Database Systems for Advanced Applications

Finding interesting tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Embedded patterns allow for discovering useful relationships which cannot be captured by induced patterns. Unfortunately, previous contributions have focused almost exclusively on mining patterns from a set of small trees. The problem of mining embedded patterns from large data trees has been neglected. This is mainly due to the complexity of this task related to the problem of unordered tree embedding test being NP-Complete. However, mining embedded patterns from large trees is important for many modern applications that arise naturally and in particular with the explosion of big data. In this paper, we address the problem of mining unordered frequent embedded tree patterns from large trees. We propose a novel approach that exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by previous approaches. A further originality of our approach is that matching information of already computed patterns is materialized as bitmaps. This technique not only minimizes the memory consumption but also reduces CPU costs by translating pattern evaluation to bitwise operations. An extensive experimental evaluation shows that our approach not only mines embedded patterns from real datasets up to several orders of magnitude faster than state-of-theart tree mining algorithms applied to large data trees but also scales well empowering the extraction of patterns from large datasets where previous approaches fail.

[1]  Tanvir Ahmed,et al.  Finding Dense Locations in Indoor Tracking Data , 2014, 2014 IEEE 15th International Conference on Mobile Data Management.

[2]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[3]  Vania Bogorny,et al.  A clustering-based approach for discovering interesting places in trajectories , 2008, SAC '08.

[4]  Sari Haj Hussein Effective Density Queries on Continuously Moving Objects; in Slides , 2012 .

[5]  Dino Pedreschi,et al.  Trajectory pattern mining , 2007, KDD '07.

[6]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[7]  David B. Thomas,et al.  Scalable XML Query Processing using Parallel Pushdown Transducers , 2013, Proc. VLDB Endow..

[8]  Vania Bogorny,et al.  A model for enriching trajectories with semantic geographical information , 2007, GIS.

[9]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Yongchao Liu,et al.  CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[11]  Ki-Joune Li,et al.  Topology of the Prism Model for 3D Indoor Spatial Objects , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[12]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[13]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .

[14]  Heng Tao Shen,et al.  Discovering popular routes from trajectories , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[15]  Ali Akoglu,et al.  Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA , 2009, Cluster Computing.

[16]  Christian S. Jensen,et al.  Mining significant semantic locations from GPS data , 2010, Proc. VLDB Endow..

[17]  Jae-Gil Lee,et al.  Traffic Density-Based Discovery of Hot Routes in Road Networks , 2007, SSTD.

[18]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[19]  Yun Chi,et al.  Canonical forms for labelled trees and their applications in frequent subtree mining , 2005, Knowledge and Information Systems.

[20]  Hua Lu,et al.  Indoor - A New Data Management Frontier , 2010, IEEE Data Eng. Bull..

[21]  Hassan A. Karimi,et al.  ONALIN: Ontology and Algorithm for Indoor Routing , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[22]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[23]  Tanvir Ahmed,et al.  Capturing hotspots for constrained indoor movement , 2013, SIGSPATIAL/GIS.

[24]  Dimitrios Gunopulos,et al.  On-Line Discovery of Dense Areas in Spatio-temporal Databases , 2003, SSTD.

[25]  Peiquan Jin,et al.  IndoorSTG: A Flexible Tool to Generate Trajectory Data for Indoor Moving Objects , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[26]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[27]  Carlo Zaniolo,et al.  High-performance complex event processing over hierarchical data , 2013, TODS.

[28]  Hua Lu,et al.  Graph Model Based Indoor Tracking , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[29]  Wu-chun Feng,et al.  Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine , 2008, CF '08.

[30]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[31]  Peiquan Jin,et al.  A Multi-Granularity Grid-Based Graph Model for Indoor Space , 2014, MUE 2014.

[32]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[33]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[34]  Chinya V. Ravishankar,et al.  Finding Regions of Interest from Trajectory Data , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[35]  M. A. Kentie Biological Sequence Alignment Using Graphics Processing Units , 2010 .

[36]  Peiquan Jin,et al.  IndoorDB: Extending Oracle to Support Indoor Moving Objects Management , 2013, DASFAA.

[37]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[38]  Jie Zhao,et al.  Semantics and Modeling of Indoor Moving Objects , 2012 .

[39]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..