Automatic abstract generation based on document structure analysis and its evaluation as a document retrieval presentation function

An automatic abstract generation system including a document structure analyzer is described. From a document, the system extracts a text structure representing rhetorical relations among sentences and sentence chunks. The system evaluates sentence importance based on the analyzed structure and decides which sentence should be discarded from an abstract. It also attempts to generate an abstract consistent with the original text by replacing connective expressions. Generated abstracts were evaluated from two points of view: the cover rate of key sentences; and the quality as document presentation media. Both experimental results proved the generated abstracts to be valid.