Enhancing statistical semantic networks with concept hierarchies

With the emergence of the semantic web, effective knowledge representation has gained importance. Statistically generated semantic networks are simple representations whose semantic power is yet to be completely explored. Though, these semantic networks are created with simple statistical measures without much overhead, they have the potential to express the semantic relationship between concepts. In this paper, we explore the capability of such networks and enhance them with concept hierarchies to serve as better knowledge representations. The concept hierarchies are built based on the level of importance of concepts. The level of importance/coverage of a concept within the given set of documents has to be taken into account to build an effective knowledge representation. In this paper, we provide a domain-independent, graph based approach for identifying the level of importance of each concept from the statistically generated semantic network which represents the entire document set. Insights about the depth of every concept is obtained by analysing the graph theoretical properties of the statistically generated semantic network. A generic concept hierarchy is created using a greedy strategy, and the original semantic network is reinforced with this concept hierarchy. Experiments over different data sets demonstrate that our approach works effectively in classifying concepts and generating taxonomies based on it, thereby effectively enhancing the semantic network.

[1]  John R. Kender,et al.  Analysis and visualization of index words from audio transcripts of instructional videos , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[2]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[3]  Shibamouli Lahiri,et al.  Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks , 2014, ArXiv.

[4]  Ian H. Witten,et al.  WEKA - Experiences with a Java Open-Source Project , 2010, J. Mach. Learn. Res..

[5]  Derek Greene,et al.  Unsupervised graph-based topic labelling using dbpedia , 2013, WSDM.

[6]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[7]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[8]  Christian Callegari,et al.  Advances in Computing, Communications and Informatics (ICACCI) , 2015 .

[9]  Li Ding,et al.  Using Ontologies in the Semantic Web: A Survey , 2005, Ontologies.

[10]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[11]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[12]  Simone Paolo Ponzetto,et al.  Taxonomy induction based on a collaboratively built knowledge repository , 2011, Artif. Intell..

[13]  Antonio Badia,et al.  Graph building as a mining activity: finding links in the small , 2005, LinkKDD '05.

[14]  T. Yildiz,et al.  Association rule based acquisition of hyponym and hypernym relation from a Turkish corpus , 2012, 2012 International Symposium on Innovations in Intelligent Systems and Applications.

[15]  John F. Sowa,et al.  Principles of semantic networks , 1991 .

[16]  Wilson Wong Learning lightweight ontologies from text across different domains using the web as background knowledge , 2009 .

[17]  Stefano Faralli,et al.  A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch , 2011, IJCAI.

[18]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[19]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[20]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[21]  Rohini K. Srihari,et al.  Graph-based text representation and knowledge discovery , 2007, SAC '07.

[22]  Asad Narayanan,et al.  Automated Generation of Concept Graphs , 2013 .

[23]  Marie-Laure Mugnier,et al.  Conceptual Graphs Are Also Graphs , 2014, ICCS.

[24]  A. Damodar,et al.  Automatic keyphrase extraction and segmentation of video lectures , 2012, 2012 IEEE International Conference on Technology Enhanced Education (ICTEE).

[25]  Osama Abu Abbas,et al.  Comparisons Between Data Clustering Algorithms , 2008, Int. Arab J. Inf. Technol..

[26]  Alexander Mehler Large Text Networks as an Object of Corpus Linguistic Studies , 2009 .

[27]  Abraham Kandel,et al.  Graph-Theoretic Techniques for Web Content Mining , 2005, Series in Machine Perception and Artificial Intelligence.

[28]  2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015, Kochi, India, August 10-13, 2015 , 2015, ICACCI.

[29]  David Hawking,et al.  Relevance weighting using distance between term occurrences , 1996 .

[30]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[32]  Shui-Lung Chuang,et al.  Taxonomy generation for text segments: A practical web-based approach , 2005, TOIS.

[33]  Rogelio Nazar,et al.  Co-occurrence Graphs Applied to Taxonomy Extraction in Scientific and Technical Corpora , 2012, Proces. del Leng. Natural.

[34]  Jordán Pascual Espada,et al.  Machine learning approach for text and document mining , 2014, ArXiv.