MEDLINE Text Mining: An Enhancement Genetic Algorithm Based Approach for Document Clustering

MEDLINE is the largest biomedical literature database. It is updated daily with 200–4,000 citations. This permanent growth induces the need of a good MEDLINE abstract clustering to accelerate the procedure of research and information retrieval. Several works have been developed in this context, but clustering MEDLINE abstracts are still an area where researchers are trying to propose new approaches to better clustering. Over the last few years, evolutionary algorithms have been widely applied to clustering problems because of their ability to avoid local optimal solutions and converge to a global one. In this article, a new approach is proposed for clustering MEDLINE abstracts based on an extension of an evolutionary algorithm which is the genetic algorithm combined with a Vector Space Model and an agglomerative algorithm.

[1]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[2]  Durga Toshniwal,et al.  A framework for classification using genetic algorithm based clustering , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[3]  Keinosuke Fukunaga,et al.  A Graph-Theoretic Approach to Nonparametric Cluster Analysis , 1976, IEEE Transactions on Computers.

[4]  Nilanjan Dey,et al.  Firefly Algorithm for Optimization of Scaling Factors During Embedding of Manifold Medical Information: An Application in Ophthalmology Imaging , 2014 .

[5]  Halil Kilicoglu,et al.  Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation , 2009, J. Biomed. Informatics.

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  Jun Gu,et al.  Efficient Semisupervised MEDLINE Document Clustering With MeSH-Semantic and Global-Content Constraints , 2013, IEEE Transactions on Cybernetics.

[8]  Dhavachelvan Ponnurangam,et al.  Rank Based Clustering For Document Retrieval From Biomedical Databases , 2009, ArXiv.

[9]  Gareth Jones,et al.  Non-hierarchic document clustering using a genetic algorithm , 1995, Information Research.

[10]  Xiquan Yang,et al.  Research on Ontology-Based Text Clustering , 2008, 2008 Third International Workshop on Semantic Media Adaptation and Personalization.

[11]  Emanuel Falkenauer,et al.  Genetic Algorithms and Grouping Problems , 1998 .

[12]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[13]  Nilanjan Dey,et al.  Medical Information Embedding in Compressed Watermarked Intravascular Ultrasound Video , 2013, ArXiv.

[14]  Nilanjan Dey,et al.  Wavelet Based Normal and Abnormal Heart Sound Identification using Spectrogram Analysis , 2012, ArXiv.

[15]  Xiaoping Sun Textual Document Clustering Using Topic Models , 2014, 2014 10th International Conference on Semantics, Knowledge and Grids.

[16]  Jia Zeng,et al.  Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity , 2009, Bioinform..

[17]  Chengzhi Zhang Document Clustering Description Based on Combination Strategy , 2009, 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC).

[18]  Wei Song,et al.  Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures , 2009, Expert Syst. Appl..

[19]  Naser El-Bathy,et al.  Intelligent Extended Clustering Genetic Algorithm , 2011, 2011 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY.

[20]  Basilio Sierra,et al.  Classifier hierarchy learning by means of genetic algorithms , 2006, Pattern Recognit. Lett..

[21]  Ravi Shankar Mishra,et al.  Segmenting the Optic Disc in Retinal Images using Thresholding , 2014 .

[22]  Nilanjan Dey,et al.  FCM Based Blood Vessel Segmentation Method for Retinal Images , 2012, ArXiv.

[23]  Xiaohua Hu,et al.  A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[24]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[25]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[26]  Nilanjan Dey,et al.  Optimisation of scaling factors in electrocardiogram signal watermarking using cuckoo search , 2013, Int. J. Bio Inspired Comput..

[27]  Nilanjan Dey,et al.  A Semi-automated System for Optic Nerve Head Segmentation in Digital Retinal Images , 2014, 2014 International Conference on Information Technology.

[28]  Nilanjan Dey,et al.  Haralick Features Based Automated Glaucoma Classification Using Back Propagation Neural Network , 2014, FICTA.

[29]  Alena Lukasová,et al.  Hierarchical agglomerative clustering procedure , 1979, Pattern Recognit..

[30]  Zhenya Zhang,et al.  Clustering aggregation based on genetic algorithm for documents clustering , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[31]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[32]  James C. Bezdek,et al.  Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[33]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[34]  Steffen Staab,et al.  Text clustering based on good aggregations , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[35]  Aloysius George,et al.  Efficient high dimension data clustering using constraint-partitioning k-means algorithm , 2013, Int. Arab J. Inf. Technol..

[36]  Pasi Fränti,et al.  Genetic algorithm with deterministic crossover for vector quantization , 2000, Pattern Recognit. Lett..

[37]  S. S. Dhande Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach , 2012 .

[38]  B. De Moor,et al.  TXTGate: profiling gene groups with text-based information , 2004, Genome Biology.

[39]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[40]  J S Suri,et al.  Automated and accurate carotid bulb detection, its verification and validation in low quality frozen frames and motion video. , 2014, International angiology : a journal of the International Union of Angiology.

[41]  Anton J. Enright,et al.  TEXTQUEST: Document Clustering of MEDLINE Abstracts For Concept Discovery In Molecular Biology , 2000, Pacific Symposium on Biocomputing.

[42]  Chiun-Chieh Hsu,et al.  Unsupervised document clustering based on keyword clusters , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[43]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Massimo Ruffolo,et al.  Managing the knowledge contained in electronic documents: a clustering method for text mining , 2001, 12th International Workshop on Database and Expert Systems Applications.

[45]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[46]  Nilanjan Dey,et al.  A comparative approach of four different image registration techniques for quantitative assessment of coronary artery calcium lesions using intravascular ultrasound , 2015, Comput. Methods Programs Biomed..

[47]  Hong He,et al.  A dynamic genetic clustering algorithm for automatic choice of the number of clusters , 2011, 2011 9th IEEE International Conference on Control and Automation (ICCA).

[48]  Wei Yuan,et al.  Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization , 2011, Inf. Sci..

[49]  Nilanjan Dey,et al.  A Novel Session Based Dual Steganographic Technique Using DWT and Spread Spectrum , 2012, ArXiv.

[50]  Nilanjan Dey,et al.  Wavelet based watermarked normal and abnormal heart sound identification using spectrogram analysis , 2012, 2012 IEEE International Conference on Computational Intelligence and Computing Research.

[51]  Nilanjan Dey,et al.  Analysis of Blood Smear and Detection of White Blood Cell Types Using Harris Corner , 2014 .

[52]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[53]  Nilanjan Dey,et al.  Analysis of P-QRS-T Components Modified by Blind Watermarking Technique Within the Electrocardiogram Signal for Authentication in Wireless Telecardiology Using DWT , 2012 .

[54]  Shamkant B. Navathe,et al.  Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering , 2004 .

[55]  Xiaohua Hu,et al.  Biomedical Ontology MeSH Improves Document Clustering Qualify on MEDLINE Articles: A Comparison Study , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[56]  Nasiroh Omar,et al.  Dynamic semantic textual document clustering using frequent terms and named entity , 2013, 2013 IEEE 3rd International Conference on System Engineering and Technology.

[57]  Filippo Molinari,et al.  Shape‐Based Approach for Coronary Calcium Lesion Volume Measurement on Intravascular Ultrasound Imaging and Its Association With Carotid Intima‐Media Thickness , 2015, Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine.

[58]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR 1979.

[59]  Michael K. Ng,et al.  A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering , 2007, DASFAA.

[60]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[61]  Sushmita Mitra An evolutionary rough partitive clustering , 2004, Pattern Recognit. Lett..

[62]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[63]  Keinosuke Fukunaga,et al.  A Branch and Bound Clustering Algorithm , 1975, IEEE Transactions on Computers.

[64]  David G. Stork,et al.  Pattern Classification , 1973 .

[65]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.