Certainty Categorization Model

We present a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. The explicit certainty markers were identified and categorized according to the four hypothesized dimensions – perspective, focus, timeline, and level of certainty. One hundred twenty one sentences from sample news stories contained a significantly lower frequency of markers per sentence (M=0.46, SD =0.04) than 564 sentences from sample editorials (M=0.6, SD =0.23), p= 0.0056, two-tailed heteroscedastic t-test. Within each dimension, editorials had most numerous markers per sentence in high level of certainty, writer’s point of view, and future and present timeline (0.33, 0.43, 0.24, and 0.22, respectively); news stories – in high and moderate levels, directly involved third party’s point of view, and past timeline (0.19, 0.20, 0.24, and 0.20, respectively). These patterns have practical implications for automation. Further analysis of editorials showed that out of 72 combinations possible under the hypothesized model, the high level of certainty from writer’s perspective expressed abstractly in the present and future time, and expressed factually in the future were very common. Twenty two combinations never occurred; and 35 had ≤ 8 occurrences. This narrows the focus for future linguistic analysis of explicit certainty markers.