Maximal Sequential Patterns: A Tool for Quantitative Semantic in Text Analysis

This chapter introduces maximal sequential patterns, how to extract them, and some applications of maximal sequential patterns for document processing and web content mining. The main objective of this chapter is showing that maximal sequential patterns preserve document semantic, and therefore they could be a good alternative to the word and n-gram models. First, this chapter introduces the problem of maximal sequential pattern mining when the data are sequential chains of words. After, it defines several basic concepts and the problem of maximal sequential pattern mining in text documents. Then, it presents two algorithms proposed by the authors of this chapter for efficiently finding maximal sequential patterns in text documents. Additionally, it describes the use of maximal sequential patterns as a quantitative semantic tool for solving different problems related to document processing and web content mining. Finally, it shows some future research directions and conclusions. DOI: 10.4018/978-1-60960-881-1.ch010

[1]  Cláudia Antunes,et al.  Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints , 2003, MLDM.

[2]  Jiawei Han,et al.  Generating semantic annotations for frequent patterns with context analysis , 2006, KDD '06.

[3]  Manuel Montes-y-Gómez,et al.  Using Lexical Patterns for Extracting Hyponyms from the Web , 2007, MICAI.

[4]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  René Arnulfo García-Hernández,et al.  Comparación de Tres Modelos de Texto para la Generación Automática de Resúmenes , 2009, Proces. del Leng. Natural.

[6]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[7]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[9]  Helena Ahonen Knowledge Discovery in Documents by Extracting Frequent Word Sequences , 1999, Libr. Trends.

[10]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[11]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[12]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[13]  Manuel Montes-y-Gómez,et al.  A Text Mining Approach for Definition Question Answering , 2006, FinTAL.

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Eréndira Rendón Lara,et al.  Text Summarization by Sentence Extraction Using Unsupervised Learning , 2008, MICAI.

[16]  Qingyu Zhang,et al.  Web Mining: a Survey of Current Research, Techniques, and Software , 2008, Int. J. Inf. Technol. Decis. Mak..

[17]  Kevin Chen-Chuan Chang,et al.  Editorial: special issue on web content mining , 2004, SKDD.

[18]  Yulia Ledeneva,et al.  Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences , 2008, MICAI.

[19]  Neal Leavitt,et al.  Data Mining for the Corporate Masses? , 2002, Computer.

[20]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[21]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[22]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[23]  José Francisco Martínez Trinidad,et al.  Document Clustering Based on Maximal Frequent Sequences , 2006, FinTAL.

[24]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[25]  Paolo Rosso,et al.  Authorship Attribution Using Word Sequences , 2006, CIARP.

[26]  Yves Kodratoff,et al.  Knowledge Discovery in Texts: A Definition, and Applications , 1999, ISMIS.

[27]  Antoine Doucet,et al.  Non-Contiguous Word Sequences for Information Retrieval , 2004 .