Profilage de textes : cadre de travail et expérience

The increasing use of "huge corpora" in natural language processing and text analysis implies that lexical, morphosyntactic and syntactic homogeneity be mastered. This requires the development of text profiling tools. We have developed such tools and a related methodology within the ELRA benchmark called "Contribution to the construction of contemporary french corpora". We show the first results of this approach as applied to the speeches of De Gaulle and Mitterrand on radio and television. We present our conclusions on this experience in particular on the relevance of the features we use for text profiling.