Big Data Knowledge Mining

Big Data (BD) era has been arrived. The ascent of big data applications where information accumulation has grown beyond the ability of the present programming instrument to catch, manage and process within tolerable short time. The volume is not only the characteristic that defines big data, but also velocity, variety, and value. Many resources contain BD that should be processed. The biomedical research literature is one among many other domains that hides a rich knowledge. MEDLINE is a huge biomedical research database which remain a significantly underutilized source of biological information. Discovering the useful knowledge from such huge corpus leading to many problems related to the type of information such as the related concepts of the domain of texts and the semantic relationship associated with them. In this paper, an agent-based system of two–level for Self-supervised relation extraction from MEDLINE using Unified Medical Language System (UMLS) Knowledgebase, has been proposed . The model uses a Self-supervised Approach for Relation Extraction (RE) by constructing enhanced training examples using information from UMLS with hybrid text features. The model incorporates Apache Spark and HBase BD technologies with multiple data mining and machine learning technique with the Multi Agent System (MAS). The system shows a better result in comparison with the current state of the art and naive approach in terms of Accuracy, Precision, Recall and F-score.

[1]  Xiaohua Hu,et al.  Relation extraction from biomedical literature with minimal supervision and grouping strategy , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Mark Stevenson,et al.  Applying UMLS for Distantly Supervised Relation Detection , 2014, Louhi@EACL.

[3]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[4]  Aida Bchir,et al.  Extraction of drug-disease relations from MEDLINE abstracts , 2013, 2013 World Congress on Computer and Information Technology (WCCIT).

[5]  Lin Yao,et al.  Relationship extraction from biomedical literature using Maximum Entropy based on rich features , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[6]  I. Halcu,et al.  Converting unstructured and semi-structured data into knowledge , 2013, 2013 11th RoEduNet International Conference.

[7]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[8]  Martin Hofmann-Apitius,et al.  Weakly Labeled Corpora as Silver Standard for Drug-Drug and Protein-Protein Interaction , 2012, LREC 2012.

[9]  Philippe Thomas,et al.  Robust relationship extraction in the biomedical domain , 2015 .

[10]  Federico Bergenti,et al.  Agents on the Move: JADE for Android Devices , 2014, WOA.

[11]  Rafael Berlanga Llavori,et al.  Towards the Discovery of Semantic Relations in Large Biomedical Annotated Corpora , 2011, 2011 22nd International Workshop on Database and Expert Systems Applications.

[12]  Edmon Begoli,et al.  Design Principles for Effective Knowledge Discovery from Big Data , 2012, 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture.

[13]  Daniel S. Weld,et al.  Learning 5000 Relational Extractors , 2010, ACL.

[14]  Jörg P. Müller,et al.  Application Impact of Multi-agent Systems and Technologies: A Survey , 2014, Agent-Oriented Software Engineering.

[15]  Adel M. Alimi,et al.  An agent-based Knowledge Discovery from Databases applied in healthcare domain , 2013, 2013 International Conference on Advanced Logistics and Transport.

[16]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[17]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[18]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[19]  Roger Clarke,et al.  Big Data's Big Unintended Consequences , 2013, Computer.

[20]  Jubilant J. Kizhakkethottam,et al.  Challenges with big data mining: A review , 2015, 2015 International Conference on Soft-Computing and Networks Security (ICSNS).

[21]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Haixun Wang,et al.  Semantic Bootstrapping: A Theoretical Perspective , 2017, IEEE Transactions on Knowledge and Data Engineering.

[23]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[24]  Mark Stevenson,et al.  Self-supervised Relation Extraction Using UMLS , 2014, CLEF.

[25]  Quoc V. Le,et al.  Document Embedding with Paragraph Vectors , 2015, ArXiv.

[26]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .