Characterizing the semantics of passwords: The role of Pinyin for Chinese Netizens

Password-based authentication is the current dominant technology for online service providers to confirm the (claimed) identities of legitimate users. Semantic patterns reflect how people choose their passwords, and understanding the patterns is useful in developing policies, guidelines and good practices to secure the password-based mechanism. Semantic patterns are hard to recognize in general and they may vary for people of different spoken languages, cultures, and ethnicity groups, etc. However, it is possible to investigate them in a specific context. In this paper, we manage to characterize the Pinyin semantics of passwords from the Chinese Netizens (up to 591 million), thanks to the well-defined structures of the Pinyin phonetic system.We perform a comprehensive analysis on the (publicly available) compromised password datasets from several leading Chinese sites for social networking, (micro)blogging, Internet forums, gaming, dating, and various other online service providers in China. The number of passwords in total sums to over 141 million, of which the largest site leaks more than 30 million on its own. Our findings show that over 4% of passwords from our datasets represent Pinyin (including names), another nearly 5% of passwords represent concatenations of Pinyin and date (i.e., Pinyin with a date prefix/suffix), and the next 17% of passwords are combinations of Pinyin and numeric (non-date) prefix/suffix. A majority (over 93%) of pure Pinyin passwords are transcribed from only 24 Chinese characters. The pure numeric pattern and the pattern containing special symbols are also studied. Over 76% of the passwords can be covered by the patterns of pure numeric and concatenation of Pinyin and digits. Special symbols appear in only 2.66% of the passwords, and they are most likely (with a percentage of 82.85%) in the middle. To the best of our knowledge, this is the first large scale study of its kind, and might yield other interesting insights into the semantic role Pinyin plays (either as good practice guidance on strengthening password security, or for improving password guessing attack). HighlightsSemantic patterns are hard to recognize in general.The work manages to characterize the Pinyin semantics of passwords from the Chinese Netizens (up to 591 million).The number of passwords in total sums to over 141 million.This is the first large scale study of its kind, and might yield other interesting insights into the semantic role Pinyin plays.

[1]  Markus Jakobsson,et al.  The Benefits of Understanding Passwords , 2012, HotSec.

[2]  Sudhir Aggarwal,et al.  Testing metrics for password creation policies by attacking large sets of revealed passwords , 2010, CCS '10.

[3]  Blase Ur,et al.  Measuring password guessability for an entire university , 2013, CCS.

[4]  Julie Thorpe,et al.  On Semantic Patterns of Passwords and their Security Impact , 2014, NDSS.

[5]  Paul C. van Oorschot,et al.  A Research Agenda Acknowledging the Persistence of Passwords , 2012, IEEE Security & Privacy.

[6]  Adam J. Aviv,et al.  Smudge Attacks on Smartphone Touch Screens , 2010, WOOT.

[7]  Julie Thorpe,et al.  Visualizing semantics in passwords: the role of dates , 2012, VizSec '12.

[8]  Joseph Bonneau,et al.  The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords , 2012, 2012 IEEE Symposium on Security and Privacy.

[9]  Heinrich Hußmann,et al.  Making graphic-based authentication secure against smudge attacks , 2013, IUI '13.

[10]  Mohammad Mannan,et al.  From Very Weak to Very Strong: Analyzing Password-Strength Meters , 2014, NDSS.

[11]  Ken Thompson,et al.  Password security: a case history , 1979, CACM.

[12]  Ross J. Anderson,et al.  A Birthday Present Every Eleven Wallets? The Security of Customer-Chosen Banking PINs , 2012, Financial Cryptography.

[13]  Cormac Herley,et al.  A large-scale study of web password habits , 2007, WWW '07.

[14]  David Malone,et al.  Investigating the distribution of password choices , 2011, WWW.

[15]  Joseph A. Cazier,et al.  An Empirical Investigation: Health Care Employee Passwords and Their Crack Times in Relationship to HIPAA Security Standards , 2007, Int. J. Heal. Inf. Syst. Informatics.