Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob?

Voice over IP (VoIP) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted VoIP packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted VoIP traffic. For instance, our 21-way classifier achieves 66% accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., "Is this a Spanish or English conversation?") rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in VoIP.

[1]  Mark Handley,et al.  SIP: Session Initiation Protocol , 1999, RFC.

[2]  Jerry D. Gibson,et al.  Variable rate CELP based on subband flatness , 1995, Proceedings IEEE International Conference on Communications ICC '95.

[3]  Sushil Jajodia,et al.  Tracking anonymous peer-to-peer VoIP calls on the internet , 2005, CCS '05.

[4]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[5]  Terrence Martin,et al.  Multilingual phone clustering for recognition of spontaneous indonesian speech utilising pronunciation modelling techniques , 2003, INTERSPEECH.

[6]  Mats Näslund,et al.  The Secure Real-time Transport Protocol (SRTP) , 2004, RFC.

[7]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[8]  Matt Blaze,et al.  Protocol failure in the escrowed encryption standard , 1994, CCS '94.

[9]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[10]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[11]  Danilo Bruschi,et al.  Voice over IPsec: analysis and solutions , 2002, 18th Annual Computer Security Applications Conference, 2002. Proceedings..

[12]  Alan McCree,et al.  A variable rate multimodal speech coder with gain-matched analysis-by-synthesis , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Jean-Luc Gauvain,et al.  Discriminative Classifiers for Language Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Baugher,et al.  The Secure Real-Time Transport Protocol , 2003 .

[15]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Jirí Navrátil,et al.  Spoken language recognition-a step toward multilinguality in speech processing , 2001, IEEE Trans. Speech Audio Process..

[17]  F. Beritelli High quality multi-rate CELP speech coding for wireless ATM networks , 1998, IEEE GLOBECOM 1998 (Cat. NO. 98CH36250).

[18]  Ronald A. Cole,et al.  The OGI 22 language telephone speech corpus , 1995, EUROSPEECH.

[19]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[20]  P. Biondi,et al.  Silver Needle in the Skype , 2006 .

[21]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Saeed Vaseghi Finite state CELP for variable rate speech coding , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  Dawn Xiaodong Song,et al.  Timing Analysis of Keystrokes and Timing Attacks on SSH , 2001, USENIX Security Symposium.

[24]  Thomas J. Walsh,et al.  Security Considerations for Voice Over IP Systems , 2005 .

[25]  W. Bastiaan Kleijn,et al.  Internet Low Bit Rate Codec (iLBC) , 2004, RFC.

[26]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[27]  Henning Schulzrinne,et al.  RTP: A Transport Protocol for Real-Time Applications , 1996, RFC.

[28]  Lei Zhang,et al.  A CELP variable rate speech codec with low average rate , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[30]  Othmar Kyas,et al.  ATM Networks , 2002 .

[31]  Yin Zhang,et al.  Detecting Stepping Stones , 2000, USENIX Security Symposium.

[32]  Wenyu Jiang,et al.  Modeling of Packet Loss and Delay and Their Effect on Real-Time Multimedia Service Quality , 2000 .