论文信息 - A Case Study in Partial Parsing Unstructured Text

A Case Study in Partial Parsing Unstructured Text

This paper presents a parsing method for the entity extraction from open source documents. A Web page of interest is first downloaded to a text file. The method then applies a set of patterns to the text file to extract interesting entity fragments. The patterns are currently particularly designed for obituary announcements. With the extracted entities, the next step is to identify these entities before they are populated into a database. An entity resolution process is presented to determine the actual identities. A case study is illustrated with the method and the results are presented also. Although the results show that the method is not technically effective and promising, the research results do help understand how well or bad a quick parsing technique extracts entities of interest from obituaries on the Web. More effective techniques should be further considered to improve the extraction results.

[1] N. F. Noy,et al. Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[2] Douglas E. Appelt,et al. FASTUS: A System for Extracting Information from Text , 1993, HLT.

[3] Douglas E. Appelt,et al. FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[4] Chia-Chu Chiang,et al. A method for entity identification in open source documents with partially redacted attributes , 2007 .