A framework for real-time dictionary updating

The development of communication technologies has contributed to the appareance of new forms in the written language that scientists have to study according to their peculiarities (typing or viewing constraints, synchronicity, etc.). In the particular case of SMS (Short Message Service), studies are complicated by a lack of data, mainly due to technical constraints and privacy considerations. In this paper, we present a corpus of 30,000 French SMS, collected through a project in Belgium named "Faites don de vos SMS a la science" (Gice your SMS to Science). This corpus is unique in its quality, its size and the fact that the SMS have been manually translated into "standard" French. We will first describe the collection process and discuss the writers' profiles. The we will explain in detail how the translation was carried out.