Information extraction on novel text using machine learning and rule-based system

Novel consists of around 30,000 to 50,000 words in total. It usually tells a story about entities and its relation one another such as, Person, Location or Organization. In order to apprehend those information, reading the whole novel is compulsory. However, it is a time-consuming task. This research proposes a solution — automatic extraction of entity relation by means of Information Extraction (IE) technique. This technique is divided into two steps. First, all the entities are retrieved from the text input, by using Named Entity Recognition (NER). Afterward, all relations is extracted by Relation Extraction (RE) process. This research implements an IE system to both NER and RE, which employs supervised machine learning approach combined with rule-based system. The main purpose of this research is to determine which features and algorithm of the machine learning are adequate to acquire the best result, and which rules are the most suitable for novel characteristics.

[1]  Nidhi Madia,et al.  Information extraction from unstructured data using RDF , 2016, 2016 International Conference on ICT in Business Industry & Government (ICTBIG).

[2]  Gumwon Hong Relation Extraction Using Support Vector Machine , 2005, IJCNLP.

[3]  Huang Zhong,et al.  Disease Named Entity Recognition by Machine Learning Using Semantic Type of Metathesaurus , 2013 .

[4]  Hongzhi Xu,et al.  Discovery of Dependency Tree Patterns for Relation Extraction , 2009, PACLIC.

[5]  Danushka Bollegala,et al.  Minimally Supervised Novel Relation Extraction Using a Latent Relational Mapping , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[7]  G. S. Anandha Mala,et al.  A survey on informaton extraction using entity relation based methods , 2011 .

[8]  Kamel Nebhi,et al.  A Rule-Based Relation Extraction System using DBpedia and Syntactic Parsing , 2013, NLP-DBPEDIA@ISWC.

[9]  Song Liu,et al.  Relation extraction from wikipedia articles by entities clustering , 2012, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems.

[10]  Shumyla Rasheed Mir,et al.  A Hybrid Approach to Extract and Classify Relation from Biomedical Text , 2015 .

[11]  Masayuki Okamoto,et al.  Company Relation Extraction from Web News Articles for Analyzing Industry Structure , 2017, 2017 IEEE 11th International Conference on Semantic Computing (ICSC).