论文信息 - Indexing in Big Data

Indexing in Big Data

Nowadays communication is through social media for almost all activities like business, knowledge, personal updates, etc. This leads to the generation of large amount of data related to different activities. Hence, social media have become a vital content of our life. But going through this huge data for analysis is a tedious and complex task. There are many solutions to overcome this problem. Data reduction, indexing, and sorting can be the solutions. Further, which will be used for visualization, recommendation, etc. Indexing techniques for highly repetitive data group have become a relevant discussion. These techniques are used to accelerate queries with value and dimension subsetting conditions. There are different types of indexing with the suitability of data type, data size, dimension, representation, storage, etc. Indexing is of vital need as whatever electronic text collection is available, it is mostly large scale and heterogeneous. Hence, the motto is to find an improved approach for text search as it is used right from the help services built into operating systems to locate file on computers. Tree-based indexing, multidimensional indexing, hashing, etc., are few indexing approaches used depending on the data structures and big data analysis (BDA). Indexing’s need is to address the speed of search. So, size of index shall be a fraction of original data and to be built at the speed of data generation to avoid delay in result. Here, few indexing techniques/search structures are discussed based on data structure, frame work, space need, simplified implementations, and applications.

Subhash K. Shinde | Madhu M. Nashipudimath | S. Shinde | M. Nashipudimath

[1] Jonathan K. Lawder. Calculation of Mappings Between One and n-dimensional Values Using the Hilbert Space-filling Curve ⋆ , 2009 .

[2] Walid G. Aref,et al. Supporting views in data stream management systems , 2010, TODS.

[3] Christof Bornhövd,et al. Web Service Discovery: Adding Semantics through Service Request Expansion and Latent Semantic Indexing , 2007, IEEE International Conference on Services Computing (SCC 2007).

[4] Cyrus Shahabi,et al. Efficient indexing and retrieval of large-scale geo-tagged video databases , 2016, GeoInformatica.

[5] Abdullah Gani,et al. A survey on indexing techniques for big data: taxonomy and performance evaluation , 2016, Knowledge and Information Systems.

[6] Cui Yu,et al. FB+-tree: Indexing based on key ranges , 2014, Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control.

[7] Jarek Gryz,et al. Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[8] Christos Faloutsos,et al. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[9] Beng Chin Ooi,et al. Indexing the edges—a simple and yet efficient approach to high-dimensional indexing , 2000, PODS.

[10] Stéphane Marchand-Maillet,et al. Quantized ranking for permutation-based indexing , 2013, Inf. Syst..

[11] D. Manjula,et al. An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling , 2015, TheScientificWorldJournal.

[12] Patrick O'Sullivan,et al. High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring , 2013, Future Gener. Comput. Syst..

[13] João Magalhães,et al. High-Dimensional Indexing by Sparse Approximation , 2015, ICMR.

[14] Arie Shoshani,et al. Analyses of multi-level and multi-component compressed bitmap indexes , 2010, TODS.

[15] Alberto O. Mendelzon,et al. Similarity-based queries , 1995, PODS '95.

[16] A. Valencia,et al. Information Retrieval and Text Mining Technologies for Chemistry. , 2017, Chemical reviews.

[17] Suhaidi Hassan,et al. A Survey On Big Data Indexing Strategies , 2016 .

[18] Jeffrey Xu Yu,et al. Fast graph query processing with a low-cost index , 2011, The VLDB Journal.