Improvement in K-Means Clustering for Information Retrieval

In the field of data science and machine learning, data clustering is a potent technique that enables academics and businesses to glean valuable insights from massive and complicated datasets. Clustering algorithms can aid in locating patterns, anomalies, and trends that might not be immediately obvious in the raw data by grouping similar data points together. Making data-driven decisions in a variety of industries, from marketing and social media to healthcare and banking, can be really helpful with this. Clustering is the process of grouping data or unlabeled examples based on similarities among them to organize and get insights from the data by using various unsupervised machine-learning algorithms. The similarities are measured as per the features available in the data. More the number of features, the more complex it gets to measure similarity between them. K-means Clustering is an unsupervised machine learning which categorizes the data into K distinct groups of similar examples as it plots the samples in an N-dimensional plane based on the features and categorizes them while conversing from random start points using Euclidean distance measurement. This research work focuses on Information retrieval using K-means Clustering and other data retrieval algorithms as data retrieval. It contains a comparative analysis between the traditional K-means, the Ranking, and Query Redirection Method, and the Term Frequency-Inverse Document Frequency method to suggest a better approach for data mining and information retrieval.

[1]  Teekam Singh,et al.  Heart Disease Prediction using Ensemble ML , 2023, 2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS).

[2]  Satvik Vats,et al.  An Analysis of Crop Recommendation Systems Employing Diverse Machine Learning Methodologies , 2023, 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT).

[3]  Satvik Vats,et al.  ASD Diagnosis in Children, Adults, and Adolescents using Various Machine Learning Techniques , 2023, 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT).

[4]  Satvik Vats,et al.  Sentiment Analysis in Stock Price Prediction: A Comparative Study of Algorithms , 2023, 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom).

[5]  Satvik Vats,et al.  Efficient NetB3 for Automated Pest Detection in Agriculture , 2023, 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom).

[6]  Satvik Vats,et al.  A Comprehensive Analysis of the Effectiveness of Machine Learning Algorithms for Predicting Water Quality , 2023, 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA).

[7]  Satvik Vats,et al.  A Study on Cervical Cancer Prediction using Various Machine Learning Approaches , 2023, 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA).

[8]  Xiaolei Huang,et al.  ABSLearn: a GNN-based framework for aliasing and buffer-size information retrieval , 2023, Pattern Analysis and Applications.

[9]  A. S. Prabuwono,et al.  OGAS: Omni-directional Glider Assisted Scheme for autonomous deployment of sensor nodes in open area wireless sensor network. , 2022, ISA transactions.

[10]  Satvik Vats,et al.  iDoc-X: An artificial intelligence model for tuberculosis diagnosis and localization , 2021, Journal of Discrete Mathematical Sciences and Cryptography.

[11]  Vikrant Sharma,et al.  Multi-Level P2P Traffic Classification Using Heuristic and Statistical-Based Techniques: A Hybrid Approach , 2020, Symmetry.

[12]  B. B. Sagar,et al.  An independent time optimized hybrid infrastructure for big data analytics , 2020 .

[13]  Ji-Won Baek,et al.  Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score , 2020, Applied Sciences.

[14]  M. C. Anderson Retrieval , 2020, Memory.

[15]  Amir Jalilifard,et al.  Semantic Sensitive TF-IDF to Determine Word Relevance in Documents , 2020, Lecture Notes in Electrical Engineering.

[16]  Satvik Vats,et al.  Performance evaluation of K-means clustering on Hadoop infrastructure , 2019 .

[17]  Vikas M. Patil,et al.  SEO: On-Page + Off-Page Analysis , 2018, 2018 International Conference on Information , Communication, Engineering and Technology (ICICET).

[18]  Vikrant Sharma,et al.  Policy for planned placement of sensor nodes in large scale wireless sensor network , 2016, KSII Trans. Internet Inf. Syst..

[19]  Vikrant Sharma,et al.  Deployment schemes in wireless sensor network to achieve blanket coverage in large-scale open area: A review , 2016 .

[20]  Muhammad Rafi,et al.  An improved semantic similarity measure for document clustering based on topic maps , 2013, ArXiv.

[21]  JOHN B. KILLORAN,et al.  How to Use Search Engine Optimization Techniques to Increase Website Visibility , 2013, IEEE Transactions on Professional Communication.

[22]  Iryna Gurevych,et al.  Combining Query Translation Techniques to Improve Cross-Language Information Retrieval , 2011, ECIR.

[23]  Damien Hanyurwimfura,et al.  A Centroid and Relationship based Clustering for Organizing Research Papers , 2014 .