Un modèle multi-sources pour la segmentation en sujets de journaux radiophoniques

We present a method for story segmentation of radio broadcast news, based on lexical, syntactic and audio cues. Starting from an existing statistical topic segmentation model which exploits the notion of lexical cohesion, we extend the formalism to include syntactic and acoustic knwoledge sources. Experimental results show that the sole use of lexical cohesion is not efficient for the type of documents under study because of the variable size of the segments and the lack of direct relation between topics and stories. The use of syntactics and acoustics enables a consequent improvement of the quality of the segmentation. Mots-cles : segmentation en sujets, corpus oraux, cohesion lexicale, indices acoustiques, indices syntaxiques.