Mining Association Rules from Unstructured Documents

association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus. I. INTRODUCTION HE information age is characterized by a rapid growth for information available in electronic media such as databases, data warehouses, intranet documents, business emails and www. This growth has created a demanding task called Knowledge Discovery in Databases (KDD) and in Texts (KDT). Therefore, researchers and companies in recent years [7, 13] focused on this task and significant progress has been made. Text Mining (TM) and Knowledge Discovery in Text (KDT) are new research areas that try to solve the problem of information overload by using techniques from The main goal of text mining is to enable users to extract information from large textual resources. The final output of the mining process varies and it can only be defined with respect to a specific application. Most Text Mining objectives fall under the following categories of operations: Feature