Language tweet characteristics of Indonesian citizens

Indonesia is a wide country which has thousands of islands, hundred languages and dialects. These conditions cause many habits and behaviour to the people, including their activities in social media. Twitter and other social media have no language rules for users. Therefore, people are able to write everything very free without any regulations when they are posting their tweets. Generally, there are five types of writing that presented in the dataset such as tweet that written in the normal form of Bahasa, mixed Bahasa with local language, mixed Bahasa with foreign language, contains abbreviations, and contains slang words. Moreover, this investigation has found sixteen characteristics of Indonesian tweet where some of them are the combination of the five writing styles. By understanding the characteristics of writing style in Twitter messages, we proposed the algorithm in the pre-processing step to alter the non-standard words into standard form in Bahasa Indonesia.