Quantitative experiments on prosodic and discourse units in the Corpus of Interactional Data

The recent years have seen growing the number initiatives related to the interface between syntax, prosody and discourse. While in English the computational counterpart of this perspective has been largely advanced both from more formal modeling (Ginzburg, 2012) and machine learning perspectives (?), in French the situation is much less clear. Some automatic tools for analyzing prosody (?; ?; ?) have been developed but tested so far mostly on monologue data. The determination of the relevant units of the different linguistic domains is a crucial issue for this kind of work. In this poster, we will present a series of quantitative evaluations of the output of various automatic tools dealing with prosody, syntax and discourse. The data we are using the Corpus of Interactional Data. This is a corpus made of 8 conversations of one hour involving two speakers. The protocol for obtaining this data was made in such a way that the interaction are highly natural featuring a lot of overlap and disfluencies.