Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on Fon-iks

In this work, we unveil new privacy threats against Voice-over-IP (VoIP) communications. Although prior work has shown that the interaction of variable bit-rate codecs and length-preserving stream ciphers leaks information, we show that the threat is more serious than previously thought. In particular, we derive approximate transcripts of encrypted VoIP conversations by segmenting an observed packet stream into subsequences representing individual phonemes and classifying those subsequences by the phonemes they encode. Drawing on insights from the computational linguistics and speech recognition communities, we apply novel techniques for unmasking parts of the conversation. We believe our ability to do so underscores the importance of designing secure (yet efficient) ways to protect the confidentiality of VoIP conversations.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Kai-Fu Lee,et al.  Speaker‐independent phoneme recognition using hidden Markov models , 1988 .

[4]  Sakir Sezer,et al.  Analysis of information leakage from encrypted Skype conversations , 2010, International Journal of Information Security.

[5]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[6]  Tadayoshi Kohno,et al.  Devices That Tell on You: Privacy Trends in Consumer Ubiquitous Computing , 2007, USENIX Security Symposium.

[7]  Michael P. Oakes,et al.  Computer Estimation of Vocabulary in a Protolanguage from Word Lists in Four Daughter Languages , 2000, J. Quant. Linguistics.

[8]  Wenke Lee,et al.  Polymorphic Blending Attacks , 2006, USENIX Security Symposium.

[9]  Charles V. Wright,et al.  Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis , 2009, NDSS.

[10]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[11]  Jeffrey Heinz,et al.  Modeling the contribution of phonotactic cues to the problem of word segmentation. , 2010, Journal of child language.

[12]  Rui Wang,et al.  Side-Channel Leaks in Web Applications: A Reality Today, a Challenge Tomorrow , 2010, 2010 IEEE Symposium on Security and Privacy.

[13]  Anna Esposito,et al.  Text Independent Methods for Speech Segmentation , 2004, Summer School on Neural Networks.

[14]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[15]  Mats Näslund,et al.  The Secure Real-time Transport Protocol (SRTP) , 2004, RFC.

[16]  Alon Lavie,et al.  Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks , 2010, AMTA.

[17]  Daniel Gildea,et al.  Learning Bias and Phonological-Rule Induction , 1996, CL.

[18]  Justin Zobel,et al.  Finding approximate matches in large lexicons , 1995, Softw. Pract. Exp..

[19]  Yuanchao Lu,et al.  ON TRAFFIC ANALYSIS ATTACKS TO ENCRYPTED VOIP CALLS , 2009 .

[20]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[21]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[22]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[23]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[24]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[25]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[26]  Roger K. Moore,et al.  Language identification: insights from the classification of hand annotated phone transcripts , 2008, Odyssey.

[27]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[28]  Jonathan Harrington,et al.  Word Boundary Identification from Phoneme Sequence Constraints in Automatic Continuous Speech Recognition , 1988, COLING.

[29]  Morris Halle,et al.  Knowledge Unlearned and Untaught: What Speakers Know about the Sounds of their Language , 2003 .

[30]  Steve Renals,et al.  Speech Recognition Using Augmented Conditional Random Fields , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Angelos D. Keromytis,et al.  A Survey of Voice over IP Security Research , 2009, ICISS.

[32]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[33]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[34]  Dawn Xiaodong Song,et al.  Timing Analysis of Keystrokes and Timing Attacks on SSH , 2001, USENIX Security Symposium.

[35]  Andrea Baiocchi,et al.  Real Time Identification of SSH Encrypted Application Flows by Using Cluster Analysis Techniques , 2009, Networking.

[36]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[37]  Nikos Fakotakis,et al.  Speech segmentation using regression fusion of boundary predictions , 2010, Comput. Speech Lang..

[38]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[39]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[40]  Bruce Hayes,et al.  A Maximum Entropy Model of Phonotactics and Phonotactic Learning , 2008, Linguistic Inquiry.

[41]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[42]  Michael Backes,et al.  Speaker Recognition in Encrypted Voice Streams , 2010, ESORICS.

[43]  Charles V. Wright,et al.  Spot Me if You Can: Uncovering Spoken Phrases in Encrypted VoIP Conversations , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[44]  Maurizio Dusi,et al.  Detection of Encrypted Tunnels Across Network Boundaries , 2008, 2008 IEEE International Conference on Communications.

[45]  Riccardo Bettati,et al.  Privacy of encrypted voice-over-IP , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[46]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[47]  Charles V. Wright,et al.  Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob? , 2007, USENIX Security Symposium.

[48]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[49]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[50]  Alon Lavie,et al.  Evaluating the Output of Machine Translation Systems , 2010, AMTA.

[51]  Matthew J. Makashay,et al.  An analysis of transcription consistency in spontaneous speech from the buckeye corpus , 2002, INTERSPEECH.

[52]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[53]  Andrea Baiocchi,et al.  Optimum packet length masking , 2010, 2010 22nd International Teletraffic Congress (lTC 22).