Privacy Implications of Voice and Speech Analysis - Information Disclosure by Inference

Internet-connected devices, such as smartphones, smartwatches, and laptops, have become ubiquitous in modern life, reaching ever deeper into our private spheres. Among the sensors most commonly found in such devices are microphones. While various privacy concerns related to microphone-equipped devices have been raised and thoroughly discussed, the threat of unexpected inferences from audio data remains largely overlooked. Drawing from literature of diverse disciplines, this paper presents an overview of sensitive pieces of information that can, with the help of advanced data analysis methods, be derived from human speech and other acoustic elements in recorded audio. In addition to the linguistic content of speech, a speaker’s voice characteristics and manner of expression may implicitly contain a rich array of personal information, including cues to a speaker’s biometric identity, personality, physical traits, geographical origin, emotions, level of intoxication and sleepiness, age, gender, and health condition. Even a person’s socioeconomic status can be reflected in certain speech patterns. The findings compiled in this paper demonstrate that recent advances in voice and speech processing induce a new generation of privacy threats.

[1]  E. Hoff How social contexts support and shape language development , 2006 .

[2]  L. DeBruine,et al.  WOMEN'S VOICE PITCH IS NEGATIVELY CORRELATED WITH HEALTH RISK FACTORS , 2010 .

[3]  Zhaocheng Huang,et al.  Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions , 2018, INTERSPEECH.

[4]  Todor Ganchev,et al.  Estimation of unknown speaker’s height from speech , 2009, Int. J. Speech Technol..

[5]  Thomas F. Quatieri,et al.  Vocal Biomarkers for Cognitive Performance Estimation in a Working Memory Task , 2018, INTERSPEECH.

[6]  Nicholas B. Allen,et al.  Multichannel Weighted Speech Classification System for Prediction of Major Depression in Adolescents , 2013, IEEE Transactions on Biomedical Engineering.

[7]  Harishchandra Dubey,et al.  BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-Based Acoustic Big Data , 2016, 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE).

[8]  Kris Demuynck,et al.  Cross-lingual Speech Emotion Recognition through Factor Analysis , 2018, INTERSPEECH.

[9]  Anu Khosla,et al.  Automatic identification of gender & accent in spoken Hindi utterances with regional Indian accents , 2008, 2008 IEEE Spoken Language Technology Workshop.

[10]  L A Streeter,et al.  Pitch changes during attempted deception. , 1977, Journal of personality and social psychology.

[11]  Preethi Jyothi,et al.  Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning , 2018, INTERSPEECH.

[12]  Björn W. Schuller,et al.  Medium-term speaker states - A review on intoxication, sleepiness and the first challenge , 2014, Comput. Speech Lang..

[13]  Daniel Jurafsky,et al.  Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates , 2013, Comput. Speech Lang..

[14]  R. Pipitone,et al.  Women's voice attractiveness varies across the menstrual cycle , 2008 .

[15]  Saturnino Luz,et al.  An Active Feature Transformation Method for Attitude Recognition of Video Bloggers , 2018, INTERSPEECH.

[16]  R. Krauss,et al.  Inferring speakers’ physical attributes from their voices , 2002 .

[17]  Li Shang,et al.  Deception detecting from speech signal using relevance vector machine and non-linear dynamics features , 2015, Neurocomputing.

[18]  Anthony F Jorm,et al.  Experiences of discrimination and positive treatment in people with mental health problems: Findings from an Australian national survey , 2015, The Australian and New Zealand journal of psychiatry.

[19]  Sanjeev Khudanpur,et al.  End-to-end Deep Neural Network Age Estimation , 2018, INTERSPEECH.

[20]  Nilay Khare,et al.  Big data privacy: a technological perspective and review , 2016, Journal of Big Data.

[21]  Guozhen An,et al.  Lexical and Acoustic Deep Learning Model for Personality Recognition , 2018, INTERSPEECH.

[22]  Philip Raschke,et al.  Is My Phone Listening in? On the Feasibility and Detectability of Mobile Eavesdropping , 2019, DBSec.

[23]  Róbert Busa-Fekete,et al.  Assessing the degree of nativeness and parkinson's condition using Gaussian processes and deep rectifier neural networks , 2015, INTERSPEECH.

[24]  Jacob Leon Kröger,et al.  What Does Your Gaze Reveal About You? On the Privacy Implications of Eye Tracking , 2019, Privacy and Identity Management.

[25]  Ning Liu,et al.  Bathroom Activity Monitoring Based on Sound , 2005, Pervasive.

[26]  M. Sigman,et al.  Automated analysis of free speech predicts psychosis onset in high-risk youths , 2015, npj Schizophrenia.

[27]  Kandarpa Kumar Sarma,et al.  Emotion Identification from Raw Speech Signals Using DNNs , 2018, INTERSPEECH.

[28]  Ohbyung Kwon,et al.  Acoustic Sensor Based Recognition of Human Activity in Everyday Life for Smart Home Services , 2015, Int. J. Distributed Sens. Networks.

[29]  Feng Ruan,et al.  Characterizing Listener Engagement with Popular Songs Using Large-Scale Music Discovery Data , 2017, Front. Psychol..

[30]  Emily Mower Provost,et al.  Classification of Huntington Disease Using Acoustic and Lexical Features , 2018, INTERSPEECH.

[31]  Ranjan Sharma,et al.  Level of asthma: Mathematical formulation based on acoustic parameters , 2016, 2016 Conference on Advances in Signal Processing (CASP).

[32]  Gábor Gosztolya,et al.  Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech , 2018, INTERSPEECH.

[33]  B. Bernstein Language and Social Class , 1960 .

[34]  Dimitra Vergyri,et al.  Speech-based assessment of PTSD in a military population using diverse feature classes , 2015, INTERSPEECH.

[35]  Mitchell D. Wilkes,et al.  Evaluation of Voice Acoustics as Predictors of Clinical Depression Scores. , 2017, Journal of voice : official journal of the Voice Foundation.

[36]  Marcin D. Bugdol,et al.  Prediction of menarcheal status of girls using voice features , 2017, Comput. Biol. Medicine.

[37]  Jacob Leon Kröger Unexpected Inferences from Sensor Data: A Hidden Privacy Threat in the Internet of Things , 2018, IFIPIoT@WCC.

[38]  Julio González,et al.  Correlations between speakers' body size and acoustic parameters of voice. , 2007, Perceptual and motor skills.

[39]  Pascal Belin,et al.  The sound of trustworthiness: Acoustic-based modulation of perceived voice personality , 2017, PloS one.

[40]  Julia Hirschberg,et al.  Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues , 2018, INTERSPEECH.

[41]  Sarah Spiekermann,et al.  Networks of Control: A Report on Corporate Surveillance, Digital Tracking, Big Data & Privacy , 2016 .

[42]  Alessandro Vinciarelli,et al.  The voice of personality: mapping nonverbal vocal behavior into trait attributions , 2010, SSPW '10.

[43]  Tomi Kinnunen,et al.  i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[44]  Shashidhar G. Koolagudi,et al.  IITKGP-SESC: Speech Database for Emotion Analysis , 2009, IC3.

[45]  W. E. Lambert,et al.  The effects of speech style and other attributes on teachers' attitudes toward pupils , 1972, Language in Society.

[46]  Kathleen C. Fraser,et al.  Linguistic Features Identify Alzheimer's Disease in Narrative Speech. , 2015, Journal of Alzheimer's disease : JAD.

[47]  Isabel Trancoso,et al.  A nativeness classifier for TED Talks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Eli Maor,et al.  To Infinity and Beyond , 1986 .

[49]  Nicholas Cummins,et al.  Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning. , 2018, Methods.

[50]  Adrian P. Simpson,et al.  Phonetic differences between male and female speech , 2009, Lang. Linguistics Compass.

[51]  Julio González,et al.  Early effects of smoking on the voice: a multidimensional study. , 2004, Medical science monitor : international medical journal of experimental and clinical research.

[52]  Shrikanth S. Narayanan,et al.  Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors , 2014, Comput. Speech Lang..

[53]  Daniel Gildea,et al.  Automated prediction and analysis of job interview performance: The role of what you say and how you say it , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[54]  Julia Hirschberg,et al.  Deep Personality Recognition for Deception Detection , 2018, INTERSPEECH.

[55]  Tim Polzehl,et al.  Automatically Assessing Personality from Speech , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[56]  Tommy Gärling,et al.  Equity, Equal Shares or Equal Final Outcomes? Group Goal Guides Allocations of Public Goods , 2017, Front. Psychol..

[57]  Elmar Nöth,et al.  Islands of failure: employing word accent information for pronunciation quality assessment of English L2 learners , 2009, SLaTE.

[58]  Jonathan Harrington,et al.  Vocal aging effects on F0 and the first formant: A longitudinal analysis in adult speakers , 2010, Speech Commun..

[59]  Colleen Richey,et al.  Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings , 2018, INTERSPEECH.

[60]  Mathew Magimai-Doss,et al.  On Learning to Identify Genders from Raw Speech Signal Using CNNs , 2018, INTERSPEECH.

[61]  Seyed Omid Sadjadi,et al.  Speaker age estimation on conversational telephone speech using senone posterior based i-vectors , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[62]  Wolfie Christl,et al.  How Companies Use Personal Data Against People. Automated Disadvantage, Personalized Persuasion, and the Societal Ramifications of the Commercial Use of Personal Information , 2017 .

[63]  Sei Jin Ko,et al.  The Sound of Power , 2015, Psychological science.

[64]  Tim Polzehl Personality in Speech , 2015 .

[65]  Robert Herms,et al.  Prediction of Deception and Sincerity from Speech Using Automatic Phone Recognition-Based Features , 2016, INTERSPEECH.

[66]  Elmar Nöth,et al.  A Survey on perceived speaker traits: Personality, likability, pathology, and the first challenge , 2015, Comput. Speech Lang..

[67]  Susan M. Hughes,et al.  Sex-specific body configurations can be estimated from voice samples. , 2009 .

[68]  A. Strauss,et al.  Social Class and Modes of Communication , 1955, American Journal of Sociology.

[69]  Aaron Lawson,et al.  The 2016 Speakers in the Wild Speaker Recognition Evaluation , 2016, INTERSPEECH.

[70]  Chi-Chun Lee,et al.  Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition , 2018, INTERSPEECH.

[71]  Sung-Hyuk Cha,et al.  Establishing the Uniqueness of the Human Voice for Security Applications , 2004 .

[72]  Isabel Trancoso,et al.  The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps towards a Common Understanding , 2019, INTERSPEECH.

[73]  P. Belin,et al.  Cracking the social code of speech prosody using reverse correlation , 2018, Proceedings of the National Academy of Sciences.

[74]  Jon Crowcroft,et al.  Privacy-Preserving Machine Learning Based Data Analytics on Edge Devices , 2018, AIES.

[75]  Nasriah Zakaria,et al.  3D body scanning technology: Privacy and ethical issues , 2012, Proceedings Title: 2012 International Conference on Cyber Security, Cyber Warfare and Digital Forensic (CyberSec).

[76]  Aurobinda Routray,et al.  Databases, features and classifiers for speech emotion recognition: a review , 2018, International Journal of Speech Technology.

[77]  Miroslav Stanek,et al.  Psychological Stress Detection in Speech Using Return-to-opening Phase Ratios in Glottis , 2015 .

[78]  Trina J. Magi Fourteen Reasons Privacy Matters: A Multidisciplinary Review of Scholarly Literature1 , 2011, The Library Quarterly.

[79]  W. F. Soskin,et al.  Judgment of Emotion in Word-Free Voice Samples , 1961 .

[80]  M. Imhof Listening to Voices and Judging People , 2010 .

[81]  Philip Raschke,et al.  Privacy implications of accelerometer data: a review of possible inferences , 2019, ICCSP.

[82]  Björn W. Schuller,et al.  “You sound ill, take the day off”: Automatic recognition of speech affected by upper respiratory tract infection , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[83]  Dan Stowell,et al.  Detection and Classification of Acoustic Scenes and Events , 2015, IEEE Transactions on Multimedia.

[85]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[86]  Nick Neave,et al.  Relationships between vocal characteristics and body size and shape in human males: An evolutionary explanation for a deep male voice , 2006, Biological Psychology.

[87]  Casey A. Klofstad,et al.  Perceptions of Competence, Strength, and Age Influence Voters to Select Leaders with Lower-Pitched Voices , 2015, PloS one.

[88]  Chi-Chun Lee,et al.  Automatic Assessment of Individual Culture Attribute of Power Distance Using a Social Context-Enhanced Prosodic Network Representation , 2018, INTERSPEECH.

[89]  William J. Mayew,et al.  Voice Pitch and the Labor Market Success of Male Chief Executive Officers , 2013 .

[90]  D. Funder,et al.  Personality as manifest in word use: correlations with self-report, acquaintance report, and behavior. , 2008, Journal of personality and social psychology.

[91]  Harriet de Wit,et al.  A Window into the Intoxicated Mind? Speech as an Index of Psychoactive Drug Effects , 2014, Neuropsychopharmacology.

[92]  Nicholas W. D. Evans,et al.  Preserving privacy in speaker and speech characterisation , 2019, Comput. Speech Lang..

[93]  Julia Hirschberg,et al.  Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection , 2017, INTERSPEECH.