The Model of Semantic Similarity Estimation for the Problems of Big Data Search and Structuring

The main problem in the field of Big Data search and processing involves constantly growing complexity of its identification and structuring for the purpose of representation in the form suitable for understanding and further use. To solve this problem authors propose to use method of multilevel semantic net building to define connections between data meta-descriptions in large distributed information arrays. The semantic model developed on the basis of the method provides visibility and compact presentation of structure of semantic relations between mass data arrays elements. Semantic meta-descriptions are considered as sets of triples “subject-predicate-object” in terms of subject area ontology of distributed operative databases and the query. Authors propose the model to search and estimate semantically similar elements of distributed databases based on clustering of semantic nets represented as graph models on corresponding levels: subject area level, search profile level and document meta-descriptions level. The relevance (semantic similarity) estimation method is based on closeness assessment of data in distributed information arrays of document and query semantic nets. To analyze the developed method authors carried out a set of computational experiments. Obtained data proved theoretical significance and application perspective of such approach.

[1]  A. A. Lezhebokov,et al.  Problem-Oriented Algorithms of Solutions Search Based on the Methods of Swarm Intelligence , 2013 .

[2]  V. V. Bova,et al.  The integrated model of representation of problem-oriented knowledge in information systems , 2014, 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT).

[3]  V. V. Bova,et al.  Models for Supporting of Problem-Oriented Knowledge Search and Processing , 2016 .

[4]  V. V. Bova,et al.  Integration and Processing of Problem-Oriented Knowledge Based on Evolutionary Procedures , 2016 .

[5]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[6]  Anatoly Karpenko,et al.  Multi-criteria estimation of the relevancy of documents in the enterprise ontological knowledge base using thematic clusterization , 2013 .

[7]  Daria Zaruba,et al.  Hybrid Bionic Algorithms for Solving Problems of Parametric Optimization , 2013 .

[8]  Elmar Kuliev,et al.  Information's semantic search, classification, structuring and integration objectives in the knowledge management context problems , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[9]  Vladimir Kureichik,et al.  Artificial Bee Colony Algorithm—A Novel Tool for VLSI Placement , 2016 .

[10]  Daria Zaruba,et al.  Data and knowledge classification in intelligence informational systems by the evolutionary method , 2016, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).

[11]  Haofen Wang,et al.  Snippet Generation for Semantic Web Search Engines , 2008, ASWC.

[12]  Daria Zaruba,et al.  Heuristic approach to model of corporate knowledge construction in information and analytical systems , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[13]  Elmar Kuliev,et al.  Problem-oriented knowledge processing on the basis of hybrid approach , 2016 .

[14]  Yong Yu,et al.  An Approach for Semantic Search by Matching RDF Graphs , 2002, FLAIRS.

[15]  Sergey Rodzin,et al.  Neuroevolution: Problems, algorithms, and experiments , 2016, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT).

[16]  John Davies,et al.  Squirrel: An Advanced Semantic Search and Browse Facility , 2007, ESWC.

[17]  Evolutionary Algorithm for Extremal Subsets Comprehension in Graphs , 2013 .