Automatic Proposition Extraction from Dependency Trees: Helping Early Prediction of Alzheimer's Disease from Narratives

Idea Density (ID) was originally proposed as a way of measuring the memory load of narratives, by representing the underlying content of the text as a series of semantic units, called propositions or ideas. From a clinical perspective, this notion has been shown to correlate with several cognitive aspects, such as memory, readability, aging, and dementia onset and progress. Traditionally, propositions are extracted manually from texts. There is a tool that can automate ID extraction [1], but it uses shallow information as input, and doesn't produce the propositions themselves as output. We propose a novel approach to obtaining the ID automatically from a text. Our method is an automation of Chand et al.'s ID manual [2], and consists of a rule-based system acting upon dependency trees. Initially, for each sentence in a text, a dependency parser is used to elicit the dependency relations between words. Then, a set of rules is recursively applied in order to process these relations to yield the corresponding propositions. We analyze preliminary results of our system using a well-formed journalistic text, and speech transcriptions of dementia patients.