Mining Generalized Associations of Semantic Relations from Textual Web Content

Traditional text mining techniques transform free text into flat bags of words representation, which does not preserve sufficient semantics for the purpose of knowledge discovery. In this paper, we present a two-step procedure to mine generalized associations of semantic relations conveyed by the textual content of Web documents. First, RDF (resource description framework) metadata representing semantic relations are extracted from raw text using a myriad of natural language processing techniques. The relation extraction process also creates a term taxonomy in the form of a sense hierarchy inferred from WordNet. Then, a novel generalized association pattern mining algorithm (GP-Close) is applied to discover the underlying relation association patterns on RDF metadata. For pruning the large number of redundant overgeneralized patterns in relation pattern search space, the GP-Close algorithm adopts the notion of generalization closure for systematic overgeneralization reduction. The efficacy of our approach is demonstrated through empirical experiments conducted on an online database of terrorist activities

[1]  Fred Popowich,et al.  From a children's first dictionary to a lexical knowledge base of conceptual graphs , 1997 .

[2]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[3]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[4]  Jochen Dörre,et al.  Text mining: finding nuggets in mountains of textual data , 1999, KDD '99.

[5]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[6]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[7]  Steffen Staab,et al.  Ontology Learning Part One - On Discoverying Taxonomic Relations from the Web , 2002 .

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[13]  Amit P. Sheth,et al.  Semantics for the Semantic Web: The Implicit, the Formal and the Powerful , 2005, Int. J. Semantic Web Inf. Syst..

[14]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[15]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[16]  Nicola Guarino,et al.  OntoSeek: content-based access to the Web , 1999, IEEE Intell. Syst..

[17]  John F. Sowa,et al.  Conceptual Graphs: Draft Proposed American National Standard , 1999, ICCS.

[18]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[19]  Ambjörn Naeve,et al.  The Human Semantic Web Shifting from Knowledge Push to Knowledge Pull , 2005, Int. J. Semantic Web Inf. Syst..

[20]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[21]  Martin Hepp,et al.  Products and Services Ontologies: A Methodology for Deriving OWL Ontologies from Industrial Categorization Standards , 2006, Int. J. Semantic Web Inf. Syst..

[22]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[23]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[24]  Alexander F. Gelbukh,et al.  Text Mining at Detail Level Using Conceptual Graphs , 2002, ICCS.

[25]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[26]  Steffen Staab,et al.  Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text , 2004, ECAI.

[27]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[28]  Hugo Liu,et al.  Unraveling the Taste Fabric of Social Networks , 2006, Int. J. Semantic Web Inf. Syst..

[29]  Ioannis P. Vlahavas,et al.  A Defeasible Logic Reasoner for the Semantic Web , 2004, Int. J. Semantic Web Inf. Syst..

[30]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[31]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[32]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[33]  Haym Hirsh,et al.  Mining Associations in Text in the Presence of Background Knowledge , 1996, KDD.

[34]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[35]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[36]  Akihiro Inokuchi Mining generalized substructures from a set of labeled graphs , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[37]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[38]  Soon Myoung Chung,et al.  Multipass Algorithms for Mining Association Rules in Text Databases , 2001, Knowledge and Information Systems.

[39]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[40]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[41]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[42]  Tim Furche,et al.  Querying the Web Reconsidered: Design Principles for Versatile Web Query Languages , 2005, Int. J. Semantic Web Inf. Syst..

[43]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .