Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks

Footsteps are among the most ubiquitous sound effects in multimedia applications. There is substantial research into understanding the acoustic features and developing synthesis models for footstep sound effects. In this paper, we present a first attempt at adopting neural synthesis for this task. We implemented two GAN-based architectures and compared the results with real recordings as well as six traditional sound synthesis methods. Our architectures reached realism scores as high as recorded samples, showing encouraging results for the task at hand.

[1]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[2]  P. Alam ‘E’ , 2021, Composites Engineering: An A–Z Guide.

[3]  D. Lim,et al.  Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains , 2020, ArXiv.

[4]  P. Alam,et al.  H , 1887, High Explosives, Propellants, Pyrotechnics.

[5]  Joshua D. Reiss,et al.  FXive: A Web Platform for Procedural Sound Synthesis , 2018 .

[6]  Luca Turchet,et al.  Sound synthesis and evaluation of interactive footsteps for virtual reality applications , 2010, 2010 IEEE Virtual Reality Conference (VR).

[7]  Justin Salamon,et al.  Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Robert J. Logan,et al.  Perception of acoustic source characteristics: walking sounds. , 1991, The Journal of the Acoustical Society of America.

[9]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[10]  Luca Turchet,et al.  Footstep sounds synthesis: Design, implementation, and evaluation of foot–floor interactions, surface materials, shoe types, and walkers’ features , 2016 .

[11]  Bruno L. Giordano,et al.  Walking and playing: what's the origin of emotional expressiveness in music? , 2006 .

[12]  Richard Kronland-Martinet,et al.  A 3-D Immersive Synthesizer for Environmental Sounds , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Jaakko Hakulinen,et al.  THE USE OF WALKING SOUNDS IN SUPPORTING AWARENESS , 2003 .

[14]  Federico Fontana,et al.  Physics-based sound synthesis and control: crushing, walking and running by crumpling sounds , 2003 .

[15]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[16]  Youngik Kim,et al.  VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network , 2020, INTERSPEECH.

[17]  Andy Farnell,et al.  Designing Sound , 2008 .

[18]  David Moffat,et al.  Web Audio Evaluation Tool: A Browser-based Listening Test Environment , 2015 .

[19]  이현주 Q. , 2005 .

[20]  Jaeseong You,et al.  GAN Vocoder: Multi-Resolution Discriminator Is All You Need , 2021, Interspeech 2021.

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Jaehyeon Kim,et al.  HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.

[23]  Gaetan Hadjeres,et al.  Neural Drum Machine : An Interactive System for Real-time Synthesis of Drum Sounds , 2019, ICCC.

[24]  J. Nistal,et al.  DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks , 2020, ArXiv.

[25]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[26]  Chris Donahue,et al.  Adversarial Audio Synthesis , 2018, ICLR.

[27]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[28]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[29]  Joshua D. Reiss,et al.  APE: Audio Perceptual Evaluation Toolbox for MATLAB , 2014 .

[30]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[32]  Joshua D. Reiss,et al.  What do your footsteps sound like? An investigation on interactive footstep sounds adjustment , 2016 .

[33]  Perry R. Cook,et al.  Modeling Bill's Gait: Analysis and Parametric Synthesis of Walking Sounds , 2002 .

[34]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[35]  Sandra Pauletto,et al.  Synthesising Knocking Sound Effects Using Conditional WaveGAN , 2020 .

[36]  Ryuichi Yamamoto,et al.  Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[38]  R. Pastore,et al.  Auditory event perception: The source—perception loop for posture in human gait , 2008, Perception & psychophysics.

[39]  Perry R. Cook,et al.  Physically Informed Sonic Modeling (PhISM): Synthesis of percussive sounds , 1997 .

[40]  Yoshua Bengio,et al.  MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.

[41]  Joshua D. Reiss,et al.  Perceptual Evaluation of Synthesized Sound Effects , 2018, ACM Trans. Appl. Percept..

[42]  Ryuichi Yamamoto,et al.  Improved Parallel Wavegan Vocoder with Perceptually Weighted Spectrogram Loss , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).

[43]  P. Alam ‘K’ , 2021, Composites Engineering.

[44]  Andy Farnell Marching onwards Procedural synthetic footsteps for video games and animation , 2007 .

[45]  Xavier Serra,et al.  Neural Percussive Synthesis Parameterised by High-Level Timbral Features , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Roberto Bresin,et al.  Sound Design for Affective Interaction , 2007, ACII.

[47]  George Tzanetakis,et al.  One Billion Audio Sounds from GPU-Enabled Modular Synthesis , 2021, 2021 24th International Conference on Digital Audio Effects (DAFx).

[48]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[49]  Sofia Dahl,et al.  Experiments on gestures: walking, running, and hitting. , 2003 .