The prediction of probable words for more immediate selection has proven a valuable technique for augmenting the communication of persons with disabilities. Statistical prediction techniques have been limited to completion of the current word and prediction of the subsequent word. This study quantifies the impact of adopting higher-order prediction techniques that rely upon increased word context. Additionally, it establishes the dependence of performance upon the size of the text used to derive the statistical database. The results suggest that adoption of higher-order techniques and larger databases can increase keystroke savings by more than 7.5 percentage points. BACKGROUND For more than 20 years, word prediction has been an important technique for augmentative communication. Traditional systems have used word frequency lists to complete words that the user has already started spelling out. However, more sophisticated predictive techniques based on the previous word or on syntactic rules have appeared in the last few years. More advanced prediction methods can provide higher degrees of keystroke savings (percentage of keystrokes eliminated by integrating the prediction method) which may translate to faster communication rates. Although several researchers have noted that the increased cognitive load associated with word prediction may interfere with rapid communication, recent findings have hinted that more accurate predictions may more than compensate for these cognitive loads (1). The advantages of increased predictive accuracy are not limited to the keystroke savings provided by word prediction lists. By providing orthographic and grammatical cues, effective word prediction can improve the quality (as well as the quantity) of message production for young people, persons with language impairments, and those with learning disabilities (2). Additionally, word prediction techniques can be used to disambiguate sequences from ambiguous keypads, correct spelling errors, and provide more accurate character predictions for scanning interfaces (3). By exploiting the current sentence context using statistical techniques, a prediction system can provide more appropriate word choices to the user. In ngram word prediction methods, the previous n-1 words are used to predict the current (n) word. The ngram data is collected by counting the occurrence of each unique n word sequence in a large corpus called the training text. For augmentative communication applications ngram techniques have been limited to unigram (n=1) and bigram (n=2) word prediction, although trigram (n=3) and higher ngram orders are commonly used in other language-related fields such as speech recognition and machine translation (4). For ngram orders higher than unigrams (n>1), there is such a large number of linguistically valid n word sequences that even in extensive training texts some sequences will not appear, or will occur too infrequently to provide statistically meaningful data. A prediction system must therefore temper its higher order ngram predictions with lower order, more reliable, ngram predictions. This is generally done through a linear interpolation process wherein predictions from each ngram order are weighted by a different factor (4). Even when using this compensatory method, however, the effectiveness of an ngram prediction model is highly dependent upon the size of the training text. Effects of ngram order and training text size RESEARCH QUESTION The objective of this study was to establish a set of performance measures for word prediction using ngrams of higher orders (bigrams and trigrams) than are typically used in augmentative communication applications. Since the accuracy of ngram prediction methods is highly dependent upon their statistical reliability, the effect of training text size on performance was also investigated. Keystroke savings is used as the single measure of predictive performance. METHOD Training texts of sizes ranging from 100 thousand words to 3 million words were constructed by evenly combining text blocks from the Brown corpus, the LOB corpus, and a collection of Time Magazine articles. All headings and formatting directives were removed from the training texts. Comprehensive ngram statistics were automatically generated and stored for each training text. Twenty-one experimental conditions were established by combining three different ngram orders (unigram, bigram, and trigram) with each of the 7 training texts. For performance measurement, 7 representative testing texts of at least 2500 words each were employed. These texts, taken from a previous study of word prediction (3), varied widely in genre and linguistic sophistication. The content of the testing texts was independent from that of the training texts. For each experimental condition, the 7 testing texts were independently generated using a 54 key QWERTY keyboard supplemented by a 10 word prediction list accessed using the F1 through F10 keys. Keystroke savings were computed for each testing text based on the numbers of keystrokes used to produce that text with and without prediction enabled. Keystroke savings were averaged across testing texts to provide a single performance measure for each condition. Because text selection was premeditated (rather than a random sample), inferential statistics were not applied. Automation of the entire text generation process made such extensive testing possible. RESULTS Figure 1 depicts the average keystroke savings for unigram, bigram, and trigram word prediction as a function of the number of words in the training text. For bigrams and trigrams, prediction components were interpolated to maximize accuracy (3). Performance increases with increasing training text size, irrespective of the ngram order. However, the increase is much more pronounced for trigrams (7.5 percentage points) than for unigrams (4.5 percentage points). For each ngram order, the shape of the performance curve is similar — a rapid increase followed by a gradual decline in the rate of improvement. Note, however, that even at a training text size of 3 million words, performance continues to improve at a non-trivial rate for higher order ngram prediction. For a given training text size, keystroke savings also increase steadily with higher ngram orders. A large jump in keystroke savings is realized when moving from unigram to bigram word prediction (6.4 percentage points at 3 million words), reflecting the transformation from context-insensitivity to context-sensitivity. The performance gain in moving from bigram to trigram prediction is considerably less dramatic (0.8 percentage points), although the difference grows for larger training 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 0 50