Indexing in Big Data

Nowadays communication is through social media for almost all activities like business, knowledge, personal updates, etc. This leads to the generation of large amount of data related to different activities. Hence, social media have become a vital content of our life. But going through this huge data for analysis is a tedious and complex task. There are many solutions to overcome this problem. Data reduction, indexing, and sorting can be the solutions. Further, which will be used for visualization, recommendation, etc. Indexing techniques for highly repetitive data group have become a relevant discussion. These techniques are used to accelerate queries with value and dimension subsetting conditions. There are different types of indexing with the suitability of data type, data size, dimension, representation, storage, etc. Indexing is of vital need as whatever electronic text collection is available, it is mostly large scale and heterogeneous. Hence, the motto is to find an improved approach for text search as it is used right from the help services built into operating systems to locate file on computers. Tree-based indexing, multidimensional indexing, hashing, etc., are few indexing approaches used depending on the data structures and big data analysis (BDA). Indexing’s need is to address the speed of search. So, size of index shall be a fraction of original data and to be built at the speed of data generation to avoid delay in result. Here, few indexing techniques/search structures are discussed based on data structure, frame work, space need, simplified implementations, and applications.

[1]  Jonathan K. Lawder Calculation of Mappings Between One and n-dimensional Values Using the Hilbert Space-filling Curve ⋆ , 2009 .

[2]  Walid G. Aref,et al.  Supporting views in data stream management systems , 2010, TODS.

[3]  Christof Bornhövd,et al.  Web Service Discovery: Adding Semantics through Service Request Expansion and Latent Semantic Indexing , 2007, IEEE International Conference on Services Computing (SCC 2007).

[4]  Cyrus Shahabi,et al.  Efficient indexing and retrieval of large-scale geo-tagged video databases , 2016, GeoInformatica.

[5]  Abdullah Gani,et al.  A survey on indexing techniques for big data: taxonomy and performance evaluation , 2016, Knowledge and Information Systems.

[6]  Cui Yu,et al.  FB+-tree: Indexing based on key ranges , 2014, Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control.

[7]  Jarek Gryz,et al.  Algorithms and analyses for maximal vector computation , 2007, The VLDB Journal.

[8]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[9]  Beng Chin Ooi,et al.  Indexing the edges—a simple and yet efficient approach to high-dimensional indexing , 2000, PODS.

[10]  Stéphane Marchand-Maillet,et al.  Quantized ranking for permutation-based indexing , 2013, Inf. Syst..

[11]  D. Manjula,et al.  An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling , 2015, TheScientificWorldJournal.

[12]  Patrick O'Sullivan,et al.  High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring , 2013, Future Gener. Comput. Syst..

[13]  João Magalhães,et al.  High-Dimensional Indexing by Sparse Approximation , 2015, ICMR.

[14]  Arie Shoshani,et al.  Analyses of multi-level and multi-component compressed bitmap indexes , 2010, TODS.

[15]  Alberto O. Mendelzon,et al.  Similarity-based queries , 1995, PODS '95.

[16]  A. Valencia,et al.  Information Retrieval and Text Mining Technologies for Chemistry. , 2017, Chemical reviews.

[17]  Suhaidi Hassan,et al.  A Survey On Big Data Indexing Strategies , 2016 .

[18]  Jeffrey Xu Yu,et al.  Fast graph query processing with a low-cost index , 2011, The VLDB Journal.

[19]  Jianzhong Li,et al.  Efficient Skyline Computation on Big Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[20]  Sateesh K. Peddoju,et al.  Data Storage Security in Cloud Paradigm , 2015, SocProS.

[21]  Albert Bifet,et al.  Mining Big Data in Real Time , 2013, Informatica.

[22]  Roberto Grossi,et al.  The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[23]  Joel L. Fagan Automatic P h r a s e Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods , 2017, SIGIR 2017.

[24]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[25]  Ajith Abraham,et al.  Efficient Multimedia Data Storage in Cloud Environment , 2015, Informatica.

[26]  Hermann Hellwagner,et al.  Online indexing and clustering of social media data for emergency management , 2016, Neurocomputing.

[27]  Gonzalo Navarro,et al.  Effective Proximity Retrieval by Ordering Permutations , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Miguel Ángel Rodríguez-García,et al.  Creating a semantically-enhanced cloud services environment through ontology evolution , 2014, Future Gener. Comput. Syst..

[29]  João Magalhães,et al.  Large-scale high-dimensional indexing by sparse hashing with l0 approximation , 2017, Multimedia Tools and Applications.

[30]  Fei Wang,et al.  Adaptive semi-supervised recursive tree partitioning: The ART towards large scale patient indexing in personalized healthcare , 2015, J. Biomed. Informatics.

[31]  Guangwen Yang,et al.  SwiftArray: Accelerating Queries on Multidimensional Arrays , 2014 .

[32]  Peter J. H. King,et al.  Querying multi-dimensional data indexed using the Hilbert space-filling curve , 2001, SGMD.

[33]  R. Mark Claudson The digital simulation of river plankton population dynamics , 1975, CACM.