Ontology Learning Based on Word Embeddings for Text Big Data Extraction

Big Data term describes data that exists everywhere in humongous volumes, raw forms, and heterogenous types. Unstructured and uncategorized data forms 95% of big data. Text big data lacks to efficiently extract domain-relevant data in a suitable time. Thus, text big data stills a barrier for big data integration and subsequently big data analytics. Because big data integration can’t consider text big data in its process of preparing data for big data analytics. On the other side, ontology represents information and knowledge in a graph schema that provides a shareable, reusing and domain-specific data. Thus, ontology fits text big data needs of extracting domain relevant data. So, this paper proposes an ontology learning (OL) methodology for text big data extraction. OL aims to provides algorithms, techniques, and tools for automatic ontology construction from the text. The proposed OL method exploits a deep learning approach i.e., word embeddings, and advanced hierarchical clustering i.e., BIRCH. The utilization of the word embeddings and the advanced hierarchical clustering improve OL quality in text big data extraction and reduce the processing time. Also, deep learning unsupervisory learns from a massive amount of unlabeled and uncategorized raw data. This great big benefit solves analytical challenge of the text big data. In evaluation, precision, recall, and f – value for the work quality and the running time for performance are measured. The quality of work is evaluated by comparing its results with gold standard datasets results. Experimental results and evaluation demonstrate that the proposed OL methodology efficiently suitable for text big data extraction.

[1]  George A. Vouros,et al.  Gold Standard Evaluation of Ontology Learning Methods through Ontology Transformation and Alignment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  David Page,et al.  bigNN: An open-source big data toolkit focused on biomedical sentence classification , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Christophe Nicolle,et al.  Understandable Big Data: A survey , 2015, Comput. Sci. Rev..

[6]  Steffen Staab,et al.  On How to Perform a Gold Standard Based Evaluation of Ontology Learning , 2006, SEMWEB.

[7]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[8]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[9]  Bin Wu,et al.  Parallelization of ontology construction and fusion based on MapReduce , 2014, 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems.

[10]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[11]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[12]  Awais Ahmad,et al.  Deep learning in big data Analytics: A comparative study , 2017, Comput. Electr. Eng..

[13]  Faïez Gargouri,et al.  Learning ontology from Big Data through MongoDB database , 2015, 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA).

[14]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[15]  K. M. Annervaz,et al.  Domain Ontology Induction Using Word Embeddings , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[16]  Gerhard Wohlgenannt,et al.  Using word2vec to Build a Simple Ontology Learning System , 2016, SEMWEB.

[17]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[18]  Goran Nenadic,et al.  Ontology Learning with Deep Learning: a Case Study on Patient Safety Using PubMed , 2016, SWAT4LS.