A Review on Biomedical Mining

A wide range of biomedical repositories is available in the distributed systems for clinical decision making. One of the most important biomedical repositories is PubMed, which gives access to more than 50 million documents from MEDLINE. Data mining is used to explore hidden and unknown patterns from the large databases. The unstructured and uncertainty problems are available for many domain fields such as biomedical repositories, biomedical databases, web mining, health care system, education and technology intensive companies due to its large size. Information extraction from biomedical repositories and analyzing this information with an experimental study is time-consuming and requires an efficient feature selection and classification models. The latest trends of text mining are able to answer many different research queries, ranging from the biomarkers, gene discovery, gene-disease prediction and drug discovery from biomedical repositories. As a result, text mining has evolved in the field of biomedical systems where text mining techniques and machine learning models are integrated using high computational resources. The initial contribution of this research is to discuss the background study about biomedical repositories, document classification, clustering the relevant documents based on MeSH terms, methodologies, and models used in Biomedical Document analysis using the Hadoop framework. The main purpose of the work is to explain about the importance of feature extraction and document classification models to find gene-disease patterns from the massive biomedical repositories.

[1]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[2]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[3]  Arunima Jaiswal,et al.  Trends in Extractive and Abstractive Techniques in Text Summarization , 2015 .

[4]  Qiu Lu,et al.  The research of decision tree mining based on Hadoop , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[5]  Erik M. van Mulligen,et al.  Comparing and combining chunkers of biomedical text , 2011, J. Biomed. Informatics.

[6]  Vivek Kale Big Data Computing: A Guide For Business and Technology Managers , 2016 .

[7]  Bernardete Ribeiro,et al.  The importance of stop word removal on recall values in text categorization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[8]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[9]  Carsten Binnig,et al.  Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data , 2009, SIGMOD 2009.

[10]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[11]  Wee Hyong Tok,et al.  Predictive Analytics with Microsoft Azure Machine Learning , 2015, Apress.

[12]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[13]  Mohammed J. Zaki,et al.  Multi-label Lazy Associative Classification , 2007, PKDD.

[14]  Orestis Kostakis,et al.  Classy: fast clustering streams of call-graphs , 2014, Data Mining and Knowledge Discovery.

[15]  Georg Göbel,et al.  A MeSH based intelligent search intermediary for Consumer Health Information Systems , 2001, Int. J. Medical Informatics.

[16]  Maizatul Akmar Ismail,et al.  RTRS: a recommender system for academic researchers , 2017, Scientometrics.

[17]  Elizabeth León Guzman,et al.  Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion , 2010, IEEE Congress on Evolutionary Computation.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Luis Argerich,et al.  Variations of the Similarity Function of TextRank for Automated Summarization , 2016, ArXiv.

[20]  Pooja Kamavisdar,et al.  A Survey on Image Classification Approaches and Techniques , 2013 .

[21]  E. A. Mary Anita,et al.  A Survey of Big Data Analytics in Healthcare and Government , 2015 .

[22]  Hammad Afzal,et al.  Biomedical text mining for concept identification from traditional medicine literature , 2014, 2014 International Conference on Open Source Systems & Technologies.

[23]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[24]  S. Vijayarani,et al.  Preprocessing Techniques for Text Mining-An Overview Dr , 2015 .

[25]  Hari Mohan Pandey,et al.  Data clustering approaches survey and analysis , 2015, 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE).

[26]  Viju Raghupathi,et al.  An Overview of Health Analytics , 2013 .

[27]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[28]  Tushar Mani,et al.  Mining Negative Association Rules , 2012 .

[29]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[30]  Thulasi Bikku,et al.  A novel somatic cancer gene-based biomedical document feature ranking and clustering model , 2019, Informatics in Medicine Unlocked.

[31]  S. Humbetov Data-intensive computing with map-reduce and hadoop , 2012, 2012 6th International Conference on Application of Information and Communication Technologies (AICT).

[32]  Lei Liu,et al.  Survey of Biodata Analysis from a Data Mining Perspective , 2005, Data Mining in Bioinformatics.

[33]  David Juckett,et al.  A method for determining the number of documents needed for a gold standard corpus , 2012, J. Biomed. Informatics.

[34]  S. A. Ouatik,et al.  Stemming and similarity measures for Arabic Documents Clustering , 2010, 2010 5th International Symposium On I/V Communications and Mobile Network.