Ontology generation for large email collections

This paper presents a new approach to identifying concepts expressed in a collection of email messages, and organizing them into an ontology or taxonomy for browsing. It incorporates techniques from text mining, information retrieval, natural language processing and machine learning to generate a concept ontology. Nominal N-gram mining is used to identify candidate concepts. Wordnet and surface text pattern matching are used to identify relationships among the concepts. A supervised clustering algorithm is then used to further cluster the concepts. The experiments show that the approach is effective.

[1]  Steffen Staab,et al.  Comparing conceptual, parti-tional and agglomerative clustering for learning taxonomies from text , 2004 .

[2]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[3]  Jaime Arguello,et al.  A bootstrapping approach for identifying stakeholders in public-comment corpora , 2007, DG.O.

[4]  Grace Hui Yang,et al.  Next steps in near-duplicate detection for eRulemaking , 2006, DG.O.

[5]  D. Mladení,et al.  Semi-automatic construction of topic ontology , 2005 .

[6]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[7]  James P. Callan,et al.  Language processing technologies for electronic rulemaking: a project highlight , 2005, DG.O.

[8]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[9]  Amit P. Sheth,et al.  Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain , 2006, FOIS.

[10]  F. Colace,et al.  An automatic algorithm for building ontologies from data , 2004, Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004..

[11]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[12]  Grace Hui Yang,et al.  Near-duplicate detection by instance-level constrained clustering , 2006, SIGIR.

[13]  Nacéra Bennacer,et al.  Ontology Discovery from Web Pages : Application to Tourism , 2004 .

[14]  David G. Elliman,et al.  Automatic Derivation of On-line Document Ontologies , 2001 .

[15]  Yimin Wang,et al.  Towards Semi-automatic Ontology Building Supported by Large-Scale Knowledge Acquisition , 2006, AAAI Fall Symposium: Semantic Web for Collaborative Knowledge Acquisition.

[16]  Vasileios Hatzivassiloglou,et al.  Building Automatically a Business Registration Ontology , 2002, DG.O.

[17]  Dunja Mladenic,et al.  Semi-automatic Construction of Topic Ontologies , 2005, EWMF/KDO.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Marta Sabou,et al.  Extracting ontologies from software documentation: a semi-automatic method and its evaluation , 2004 .

[20]  Eva Blomqvist Fully Automatic Construction of Enterprise Ontologies Using Design Patterns: Initial Method and First Experiences , 2005, OTM Conferences.

[21]  Latifur Khan,et al.  Automatic Ontology Derivation Using Clustering for Image Classification , 2002, Multimedia Information Systems.

[22]  John R. Smith,et al.  Semi-automatic, data-driven construction of multimedia ontologies , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).