Spot Me if You Can: Uncovering Spoken Phrases in Encrypted VoIP Conversations

Despite the rapid adoption of Voice over IP (VoIP), its security implications are not yet fully understood. Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, we show that when the audio is encoded using variable bit rate codecs, the lengths of encrypted VoIP packets can be used to identify the phrases spoken within a call. Our results indicate that a passive observer can identify phrases from a standard speech corpus within encrypted calls with an average accuracy of 50%, and with accuracy greater than 90% for some phrases. Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. In addition, we examine the impact of various features of the underlying audio on our performance and discuss methods for mitigation.

[1]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Gustavus J. Simmons,et al.  Forward Search as a Cryptanalytic Tool Against a Public Key , 1982, 1982 IEEE Symposium on Security and Privacy.

[5]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[6]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[11]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[12]  Herbert Gish,et al.  Phonetic training and language modeling for word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Allen Gersho,et al.  Speech and Audio Coding for Wireless and Network Applications , 1993 .

[14]  William R. Gardner,et al.  QCELP: A Variable Rate Speech Coder for CDMA Digital Cellular , 1993 .

[15]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[16]  Jean-Claude Junqua,et al.  A robust algorithm for word boundary detection in the presence of noise , 1994, IEEE Trans. Speech Audio Process..

[17]  Sean R. Eddy,et al.  Multiple Alignment Using Hidden Markov Models , 1995, ISMB.

[18]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[19]  Henning Schulzrinne,et al.  RTP: A Transport Protocol for Real-Time Applications , 1996, RFC.

[20]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[21]  Lei Zhang,et al.  A CELP variable rate speech codec with low average rate , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[24]  Alexandros Potamianos,et al.  Multi-band speech recognition in noisy environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[25]  M. Handley,et al.  SIP: Session Initiation Protocol , 1999, RFC.

[26]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[27]  Carmen Peláez-Moreno,et al.  Recognizing voice over IP: a robust front-end for speech recognition on the world wide web , 2001, IEEE Trans. Multim..

[28]  Dawn Xiaodong Song,et al.  Timing Analysis of Keystrokes and Timing Attacks on SSH , 2001, USENIX Security Symposium.

[29]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[30]  Wai C. Chu,et al.  Speech Coding Algorithms: Foundation and Evolution of Standardized Coders , 2003 .

[31]  Wai C. Chu,et al.  Speech Coding Algorithms , 2003 .

[32]  Baugher,et al.  The Secure Real-Time Transport Protocol , 2003 .

[33]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[34]  Mats Näslund,et al.  The Secure Real-time Transport Protocol (SRTP) , 2004, RFC.

[35]  Philip S. Yu,et al.  CSR: Speaker Recognition from Compressed VoIP Packet Stream , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[36]  Sushil Jajodia,et al.  Tracking anonymous peer-to-peer VoIP calls on the internet , 2005, CCS '05.

[37]  Tracking Anonymous Peer-to- Peer VoIP Calls on the Internet , 2006 .

[38]  Philip S. Yu,et al.  Finding "Who Is Talking to Whom" in VoIP Networks via Progressive Stream Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[39]  Tadayoshi Kohno,et al.  Devices That Tell on You: Privacy Trends in Consumer Ubiquitous Computing , 2007, USENIX Security Symposium.

[40]  Charles V. Wright,et al.  Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob? , 2007, USENIX Security Symposium.

[41]  Mariusz Glabowski,et al.  Global System for Mobile Communications , 2010 .