"Pause Units" and Analysis of Spontaneous Japanese Dialogues: Preliminary Studies

We consider the use of natural pauses to aid analysis of spontaneous speech, studying four Japanese dialogues concerning a simulated direction-finding task. Using new techniques, we added to existing transcripts information concerning the placement and length of significant pauses within turns (breathing intervals of any length or silences longer than approximately 400 milliseconds). We then addressed four questions: (1) Are “pause units” (segments bounded by natural pauses) reliably shorter than utterances? The answer was Yes: on average, pause units in our corpus were on average 5.89 Japanese morphemes long, 60% the length of whole utterances, with much less variation. (2) Would hesitation expressions yield shorter units if used as alternate or additional boundaries? The answer was Not much, apparently because pauses and hesitation expressions often coincide. We found no combination of expressions which gave segments as much as one morpheme shorter than pause units on average. (3) How well-formed are pause units from a syntactic viewpoint? We manually judged that 90% of the pause units in our corpus could be parsed with standard Japanese grammars once hesitation expressions had been filtered from them. (4) Does translation by pause unit deserve further study? The answer was Yes, in that a majority of the pause units in four dialogues gave understandable translations into English when translated by hand. We are thus encouraged to further study a “divide and conquer” analysis strategy, in which parsing and perhaps translation of pause units is carried out before, or even without, attempts to create coherent analyses of entire utterances.