Software Internationalization and Localization: An Industrial Experience

Software internationalization and localization are important steps in distributing and deploying software to different regions of the world. Internationalization refers to the process of reengineering a system such that it could support various languages and regions without further modification. Localization refers to the process of adapting an internationalized software for a specific language or region. Due to various reasons, many large legacy systems did not consider internationalization and localization at the early stage of development. In this paper, we present our experience on, and propose a process along with tool supports for software internationalization and localization. We reengineer a large legacy commercial financial system called PAM of State Street Corporation, which is written in C/C++, containing 30 different modules, and more than 5 millions of lines of source code. We propose a source code ranker that recovers important source code to be analyzed. Based on this code, we extract general patterns of the source code that need to be reengineered for internationalization. We divide the patterns into 2 categories: convertible patterns and suspicious patterns. To locate the source code that need to be modified, we develop an automated tool I18nLocator, that consumes these patterns and outputs the locations that match the patterns. The source codes matching the convertible patterns are automatically converted, and those matching the suspicious patterns are converted by developers considering the context of the corresponding codes. For localization, we extract hard-coded strings, translate them, and store them into resource data files. Out of the 504 thousands of lines of source code that are modified using our proposed approach, we can automatically modify 79.76% of them, saving much valuable developers' time. The quality of the resultant system is also good. The number of bugs per lines of code modified found during user acceptance test and deployment to the production environment is 0.000218 bugs/LOC.

[1]  Charles Petzold Microsoft XNA Framework Edition: Programming Windows Phone 7 , 2010 .

[2]  Dongmei Zhang,et al.  XIAO: tuning code clones at hands of engineers in practice , 2012, ACSAC '12.

[3]  André van der Hoek,et al.  Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering , 2010, FSE 2010.

[4]  Bert Esselink,et al.  A Practical Guide to Localization , 2000 .

[5]  Tao Xie,et al.  Locating need-to-translate constant strings in web applications , 2010, FSE '10.

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[8]  Tao Xie,et al.  Locating need-to-translate constant strings for software internationalization , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[9]  Francoise Balmas,et al.  Using dependence graphs as a support to document programs , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[10]  Tao Xie,et al.  TranStrL: An automatic need-to-translate string locator for software internationalization , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[11]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[12]  Francoise Balmas,et al.  Displaying dependence graphs: a hierarchical approach , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[13]  LiGuo Huang,et al.  Experiences with text mining large collections of unstructured systems development artifacts at jpl , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[14]  Robert Howard,et al.  Software Internationalization and Localization: An Introduction , 1993 .