COSY-MATS: An Intelligent and Scalable Summarisation Shell

ANN ANN ANN Encoder Encoder Encoder S T A M Y S O C DOMAIN ANN DOMAIN ANN ANN Pragmatic Encoder ANN ANN SURFACE Encoder INTERMEDIARY Decoder Semantic Pragmatic Morphological Syntactic Semantic Pragmatic Synthesiser Synthesiser Synthesiser Synthesiser Synthesiser Synthesiser Synthesiser Synthesiser BLACKBOARD BLACKBOARD APPLICATION-SPECIFIC GENERIC SYMBOLIC MODULES SYMBOLIC MODULES APPLICATION-SPECIFIC SYNTHESISERS SYNTHESISERS GENERIC ANALYSERS ANALYSERS PRAGMATIC CONTENT SELECTION A N N Summary List of Important Propositions Lexical Lexical Analyser Lexical Morphological Analyser Syntactic Analyser Analyser Lexical Figure 4: The Architecture of COSY-MATS 3 A Scalable Architecture for Intelligent Summarisation Having identi ed `universal' content selection features, as well as some of the ways these interact with each other, the following architecture was designed for a full-scale implementation of the cosy-mats summarisation shell (Fig. 4) (Aretoulaki, 1996). Every sentence in the text to be summarised1 is rst processed by a cluster of standard symbolic analysers; morphological, syntactic, semantic and pragmatic. The result of this processing is the evaluation of a set of basic linguistic and extralinguistic features that provide the input for a cascade of low and higher-level Arti cial Neural Networks (anns), each responsible for speci c subtasks. The low-level anns map linguistic features (surface and intermediary) into extralinguistic features (intermediary and pragmatic). The pragmatic features provide the input to the highest-level content selection ann that ultimately determines the relative degree of importance of each sentence. This latter ann is also the only component of cosy-mats that has been implemented to date. Finally, the sentences selected as important during the content selection phase will be used as the basis for generating either a comprehensive summary or a more concise abstract (Aretoulaki, 1996). This processing will take place in another cluster of symbolic processors, almost symmetric to that used for text analysis and interpretation. It is here that the planning and the actual synthesis of the summary/abstract will be realised. However, it is important to note that the output list of the best-scoring sentences produced by the content selection ann can also be used to provide a draft summary, i.e. a concatenation of already-existing sentences instead of an original text (cf. (Kupiec et al., 1995)). This is also the only type of generation that is currently preoccupying this research (cf. Section 4.1). Despite the dominance of the generic modules therein, cosy-mats does provide for the incorporation of application-speci c information. First of all, the architecture is highly modular, so that new specialised processors can be |in principle| simply plugged in. The simplicity of the interface between the various 1which is assumed to be integral and coherent, rather than a random collection of propositions; modules means that new modules that are either symbolic or connectionist can equally well be accommodated. For example, in addition to the existing lower-level anns, other anns can be easily incorporated which have been trained to recognise speci c keywords and structural phrases that di erentiate one domain or text type from the other in expressing the same rhetorical and pragmatic functions. Hence, cosy-mats can function as a shell for the building of specialised summarisers. As regards the front-end symbolic analysers, the processing that will take place therein will be dictated by the type of data that needs to be computed in the anns. The latter computation, in turn, will be based on the identi ed generic and application-speci c mappings across the three levels of description: the pragmatic, the intermediary and the surface (Section 2). In addition, it is the implementation of the content selection ann that will determine the eventual type and number of pragmatic features required for the whole process of summarisation (Section 4). As a result, a partial analysis and interpretation of the input text only need to be performed in cosy-mats. The common problem in nlu-based systems of combinatorial explosion and ine cient computation in the search for a solution will thus be largely avoided. At the same time, this pragmatism in the analysis and interpretation processes does not decrease the amount of deep processing (semantic, discourse and pragmatic) carried out in the system. High-level processing is salient in the pragmatic features identi ed. These are, nonetheless, 'grounded' by means of the generic lower-level features, as well as other surface and semantic characteristics of texts pertaining to the speci c application of interest. In summary, the proposed architecture is both modular and hybrid. The complex task of content selection is systematically decomposed into much more manageable computations. In addition, the strong points of both symbolic and connectionist processing are combined in a complementary way (cf. (Aretoulaki, 1996)). The symbolic analysers can work with structured data of arbitrary length laden with variables. They also have powerful symbol-matching facilities (as is appropriate for lower-level text analysis). In contrast, the anns are able to deal with fuzzy and inexact processing (as is involved in importance determination and interlevel feature mappings) (McClelland and Rumelhart, 1986; Rumelhart and McClelland, 1986). 4 Empirical Evidence As the rst and most crucial step in implementing cosy-mats, a prototype of its content selection ann was developed. This is a standard feed-forward back-propagation network (Rumelhart et al., 1986). This ann receives individual text sentences from the text to be summarised, hand-coded2 by means of the identi ed pragmatic features, and assigns to them degrees of importance. It has been a major assumption behind this work that it is feature combinations rather than individual features that characterise sentence importance (Sections 1 & 2). An ann learns such interactions naturally, which is why the connectionist paradigm was adopted for the content selection task. The training corpus consisted of 1,880 sentences in total, taken from the real-world text collection described in Section 2. 1,100 of them are sentences largely out of their context, while the remaining 780 sentences make up 29 full texts. In contrast to the diversity of the former subcorpus, each of the latter texts is approximately 23 sentences long and was fully encoded. The encoding was carried out by 5 individuals on the basis of the above-mentioned manual which exempli es the correlations between the surface and the more abstract features in the proposed scheme. The manual was used in order to standardise the encoding process as much as possible, as well as to validate the proposed ways in which the evaluation of the abstract pragmatic features can be objecti ed and fully automated later on in the completed system. Experiments to date (cf. (Aretoulaki, 1996)) have demonstrated the superiority of the pragmatic features over input to the ann from across the three levels of abstraction (58.1% vs 56.1% success on average; where 'success' coincides with agreement with the judgement made by the human encoder regarding the level of importance of the corresponding sentence). The simultaneous use of control experiments with noisy data3 has ensured the validity of these results (50.1% success). In addition, the testing on whole texts has provided comparable results to those acquired with isolated sentences, namely 56.8% success on average; this suggests that the pragmatic features are su ciently abstract to capture hierarchical and structural aspects of the corresponding discourses. The diversity of the corpus in terms of subject matter, text type and length provides su cient evidence for the appropriateness of the pragmatic features for the high-level representation of texts from any domain or text type. Moreover, the portability of these pragmatic content selection features has also been partly proved with experiments on whole texts (Aretoulaki, 1996). These indicated that only a small amount of retraining is required for the ann to deal with new text types, which involves a limited number of representative texts. 2given that the remaining components of cosy-mats have not been implemented as yet; 3These used characteristics of the text that should be irrelevant to the content selection task, such as 'The second word in the sentence ends in a vowel'. Thus, what is predicted to di er between text types is the relative in uence of each of the identi ed features in the nal weighting of the corresponding sentence. 4.1 Generating Draft Summaries The `draft' summaries that result after concatenating the sentences of the input text that were selected by the ann as important are, on the whole, adequate for current awareness purposes (See (Aretoulaki, to appear) for a detailed evaluation of this and other draft output). The ann receives a single |coherent and largely cohesive| text each time, rather than a collection of unrelated texts. Sentence selection was based on the 24 pragmatic features used for their encoding and the statistical correlations among them, as indicated in the training corpus. Most importantly, by ltering out the sentences for which the ann did not have a clear decision, i.e. by adapting the corresponding threshold on-line, content selection can be more ne-grained and the output summaries more brief. An example draft summary for a newspaper article after the application of this type of ltering is shown below. In this case, there was 82.6% agreement between the ann decision and the corresponding human judgement regarding the importance of individual sentences in this article4. (1) Moscow editors feel the old-fashioned grip of the state (Headline) (3) Intense party pressure for the dismissal of a prominent liberal editor and a new campaign to discredit t

[1]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[2]  Malcolm Coulthard,et al.  Advances in Written Text Analysis , 1994 .

[3]  Lisa F. Rau,et al.  SCISOR: extracting information from on-line news , 1990, CACM.

[4]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[5]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[6]  Wendy G. Lehnert,et al.  Corpus-Driven Knowledge Acquisition for Discourse Analysis , 1994, AAAI.

[7]  Myrna Gopnik Linguistic structures in scientific texts , 1972 .

[8]  Candace L. Sidner,et al.  Focusing in the comprehension of definite anaphora , 1986 .

[9]  Jean-Pierre Desclés,et al.  Knowledge-Based Automatic Abstracting: Experiments in the Sublanguage of Elementary Geometry , 1994 .

[10]  Michael P. Jordan Openings in Very Formal Technical Texts , 1993 .

[11]  Eduard Hovy,et al.  Generating Natural Language Under Pragmatic Constraints , 1988 .

[12]  Michael Halliday,et al.  Cohesion in English , 1976 .

[13]  Wendy G. Lehnert,et al.  Plot Units and Narrative Summarization , 1981, Cogn. Sci..

[14]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[15]  Barbara J. Grosz,et al.  The Representation and Use of Focus in a System for Understanding Dialogs , 1977, IJCAI.

[16]  Jun'ichi Tsujii,et al.  Breaking Down Rhetorical Relations for the purpose of Analysing Discourse Structures , 1994, COLING.

[17]  Brigitte Endres-Niggemeyer,et al.  Professional Summarizing: No Cognitive Simulation Without Observation , 1998, J. Am. Soc. Inf. Sci..

[18]  Martha W. Evens : Getting Computers to Talk like You and Me: Discourse Context, Focus, and Semantics (An ATN Model) , 1987 .

[19]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[20]  Seiji Miike,et al.  Abstract Generation Based on Rhetorical Structure Extraction , 1994, COLING.

[21]  Yorick Wilks,et al.  Multiple Agents and the Heuristic Ascription of Belief , 1987, IJCAI.

[22]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[23]  M. P. Jordan Rhetoric of Everyday English Texts , 1984 .

[24]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[25]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[26]  Walter Kintsch,et al.  Cognitive Psychology and Discourse: Recalling and Summarizing Stories , 1978 .

[27]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[28]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[29]  T. E. R. Singer,et al.  Abstracting scientific and technical literature;: An introductory guide and text for scientists, abstractors, and management , 1971 .

[30]  Bonnie Webber,et al.  So what can we talk about now , 1986 .

[31]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .