Information Extraction tasks : a survey

An Information Extraction activity is a complex process that can be decomposed into several tasks. This decomposition brings the following advantages: (i) for each task it becomes possible to choose the best technique independently from the other tasks; (ii) an Information Extraction program can be developed as a set of independent modules (one for each task), making it easy to perform local debugging; (iii) it becomes easy to customize the Information Extraction activity through reordering, selection or even composition the tasks. This paper presents a commonly used decomposition of the Information Extraction activities and gives detail about the most used machine learning and rule-based techniques for each task.