Text mining approach for knowledge extraction in Sahîh Al-Bukhari

The areas of information retrieval (IR) and information extraction (IE) are the subject of active research for several years in the community of Artificial Intelligence and Text Mining. With the appearance of large textual corpora in the recent years, we felt the need to integrate modules for information extraction in the existing information retrieval systems. The processing of large textual corpora leads needs that are situated at the border of information extraction and information retrieval areas. Our work in this paper, focus on the extraction of the surface information, i.e. information that not requires complex linguistic processing to be categorized. The goal is to detect and extract passages or sequences of words containing relevant information from the prophetic narrations texts. We propose Finite state transducers-based system that solves successively the problem of texts comprehension. Experimental evaluation results demonstrated that our approach is feasible. Our system achieved encouraging precision and recall rates, the overall precision and recall are 71% and 39% respectively.

[1]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[2]  Yves Schabes,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997 .

[3]  Josef van Genabith,et al.  An Automatically Built Named Entity Lexicon for Arabic , 2010, LREC.

[4]  Ronen Feldman,et al.  Text Mining and Information Extraction , 2010, Data Mining and Knowledge Discovery Handbook.

[5]  Khaled Shaalan,et al.  Arabic Named Entity Recognition from Diverse Text Types , 2008, GoTAL.

[6]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.

[7]  Núria Gala Pavia Using the incremental finite-state architecture to create a Spanish shallow parser , 1999 .

[8]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[9]  Sofie Johansson Kokkinakis,et al.  A Cascaded Finite-State Parser for Syntactic Analysis of Swedish , 1999, EACL.

[10]  Lluís Padró Cirera,et al.  A named entity recognition system based on a finite automata acquisition algorithm , 2005 .

[11]  Yassine Benajiba,et al.  ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy , 2009, CICLing.

[12]  Farid Meziane,et al.  A Rule Based Persons Names Arabic Extraction System , 2009 .

[13]  Günter Neumann,et al.  An Information Extraction Core System for Real World German Text Processing , 1997, ANLP.

[14]  Ralph Grishman,et al.  The NYU System for MUC-6 or Where’s the Syntax? , 1995, MUC.

[15]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[16]  Hayssam N. Traboulsi,et al.  Arabic named entity extraction: A local grammar-based approach , 2009, IMCSIT.

[17]  N. Gala Using the incremental finite state architecture to create a Spanish shallow parser. , 1999 .

[18]  Maria Liakata,et al.  Named Entity Recognition in Greek Texts , 2000, LREC.