An efficient interval query algorithm based on inverted list in cloud environment

Interval overlap query has played a more and more significant role in genomics researches and the development of biomedicine. However, traditional query approches based on single computer cannot handle the problem of limited query speed in the query process properly. A new algorithm based on cloud computing technology named CNCList+ has been proposed to increase the query speed. Nevertheless, the mechanism of CNCList+ that it needs to scan the data of subgroups orderly in every query process reduces the degree of query speed enhancement. Considering the significant role of inverted list in data idex area, the concept of inverted list and the technique of cloud computing are combined together in this paper, forming an efficient query algorithm named IQIL to futher speed up the query speed. In addition, detailed comparison experiments between IQIL and CNCList+ prove the superior performance of IQIL on query speed, thus demonstrating the extraordinary ability of IQIL on solving the limited query speed problem of interval overlap query.

[1]  Yufei Tao,et al.  MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries , 2001, VLDB.

[2]  Webb Miller,et al.  GALA, a database for genomic sequence alignments and annotations. , 2003, Genome research.

[3]  Michael Stonebraker,et al.  Segment indexes: dynamic indexing techniques for multi-dimensional interval data , 1991, SIGMOD '91.

[4]  Yong Liu,et al.  Mining Frequent Patterns Based on Inverted List , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[5]  Lin Guo,et al.  Efficient inverted lists and query algorithms for structured value ranking in update-intensive relational databases , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Thomas Seidl,et al.  Joining interval data in relational databases , 2004, SIGMOD '04.

[7]  Veera Boonjing,et al.  Character-Based Indexing Using Inverted Lists , 2009, 2009 International Conference on Computer Technology and Development.

[8]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[9]  Liu Yong,et al.  Combination Tree for Mining Frequent Patterns Based on Inverted List , 2006, 2006 International Conference on Computational Intelligence and Security.

[10]  Zhiqiong Wang,et al.  Efficient Interval Query of Genome Alignment and Interval Databases in Cloud Environment , 2012 .

[11]  Hans-Peter Kriegel,et al.  Managing Intervals Efficiently in Object-Relational Databases , 2000, VLDB.

[12]  Torsten Suel,et al.  Optimized Inverted List Assignment in Distributed Search Engine Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[13]  Alexander V. Alekseyenko,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl647 Data and text mining Nested Containment List (NCList): a new algorithm , 2022 .