Pattern Matching Techniques for Replacing Missing Sections of Audio Streamed across Wireless Networks

Streaming media on the Internet can be unreliable. Services such as audio-on-demand drastically increase the loads on networks; therefore, new, robust, and highly efficient coding algorithms are necessary. One method overlooked to date, which can work alongside existing audio compression schemes, is that which takes into account the semantics and natural repetition of music. Similarity detection within polyphonic audio has presented problematic challenges within the field of music information retrieval. One approach to deal with bursty errors is to use self-similarity to replace missing segments. Many existing systems exist based on packet loss and replacement on a network level, but none attempt repairs of large dropouts of 5 seconds or more. Music exhibits standard structures that can be used as a forward error correction (FEC) mechanism. FEC is an area that addresses the issue of packet loss with the onus of repair placed as much as possible on the listener's device. We have developed a server--client-based framework (SoFI) for automatic detection and replacement of large packet losses on wireless networks when receiving time-dependent streamed audio. Whenever dropouts occur, SoFI swaps audio presented to the listener between a live stream and previous sections of the audio stored locally. Objective and subjective evaluations of SoFI where subjects were presented with other simulated approaches to audio repair together with simulations of replacements including varying lengths of time in the repair give positive results.

[1]  G. Williams Chaos theory tamed , 1997 .

[2]  R. Walker Visual metaphors as music notations for sung vowel spectra in different cultures , 1997 .

[3]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[4]  Bernd Girod,et al.  Adaptive playout scheduling and loss concealment for voice communication over IP networks , 2003, IEEE Trans. Multim..

[5]  Stefan M. Rüger,et al.  A Polyphonic Music Retrieval System Using N-Grams , 2004, ISMIR.

[6]  Gonzalo Navarro,et al.  A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching , 1998, CPM.

[7]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[8]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[9]  Seungjin Choi,et al.  Nonnegative features of spectro-temporal sounds for classification , 2005, Pattern Recognit. Lett..

[10]  Andranick Tanguiane Artificial Perception and Music Recognition , 1993, Lecture Notes in Computer Science.

[11]  Stephen F. Bush,et al.  Active Jitter Control , 2000 .

[12]  J. Jośe A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[13]  Marc Leman,et al.  Tendencies, perspectives, and opportunities of musical audio-mining , 2002 .

[14]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[15]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[16]  Petri Toiviainen,et al.  MIR In Matlab: The MIDI Toolbox , 2004, ISMIR.

[17]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[18]  Wenyu Jiang,et al.  Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss , 2002, NOSSDAV '02.

[19]  Jonathan Foote,et al.  Media segmentation using self-similarity decomposition , 2003, IS&T/SPIE Electronic Imaging.

[20]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[21]  W. Richard Stevens,et al.  TCP/IP Illustrated, Volume 1: The Protocols , 1994 .

[22]  J. Wolfe,et al.  Spectral centroid and timbre in complex, multiple instrumental textures , 2004 .

[23]  Seungjae Lee,et al.  Audio fingerprinting based on normalized spectral subband centroids , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  Benjamin W. Wah,et al.  LSP-based multiple-description coding for real-time low bit-rate voice over IP , 2005, IEEE Transactions on Multimedia.

[25]  Fabien Gouyon,et al.  Percussion-related Semantic Descriptors of Music Audio Files , 2004 .

[26]  Fernando Pereira,et al.  MPEG-7 the generic multimedia content description standard, part 1 - Multimedia, IEEE , 2001 .

[27]  David G. Stork,et al.  Pattern Classification , 1973 .

[28]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[29]  Richard G. Lyons,et al.  Understanding Digital Signal Processing , 1996 .

[30]  Ian H. Witten,et al.  The New Zealand Digital Library MELody inDEX , 1997, D Lib Mag..

[31]  Donald F. Towsley,et al.  Adaptive FEC-based error control for Internet telephony , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[32]  Andranik S. Tangiuane Artificial Perception and Music Recognition , 1993 .

[33]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[34]  Geraint A. Wiggins,et al.  Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music , 2002 .

[35]  Ching-Hua Chuan,et al.  Polyphonic Audio Key Finding Using the Spiral Array CEG Algorithm , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[36]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[37]  Felix Salzer,et al.  Structural Hearing: Tonal Coherence in Music , 1952 .

[38]  Eyal Menin The Streaming Media Handbook , 2002 .

[39]  Ning Hu,et al.  A comparison of melodic database retrieval techniques using sung queries , 2002, JCDL '02.

[40]  Malcolm Slaney,et al.  Semantic-audio retrieval , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Steffen Pauws,et al.  CubyHum: a fully operational "query by humming" system , 2002, ISMIR.

[42]  Esko Ukkonen,et al.  Including Interval Encoding into Edit Distance Based Music Comparison and Retrieval , 2003 .

[43]  Esko Ukkonen,et al.  The C-BRAHMS project , 2003, ISMIR.

[44]  Marsha Berman,et al.  The directory of tunes and musical themes , 1975 .

[45]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[46]  Jaideep Srivastava,et al.  Error spreading: a perception-driven approach to handling error in continuous media streaming , 2002, TNET.

[47]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[48]  Peter Knees,et al.  Exploring the music similarity space on the web , 2011, TOIS.

[49]  V. Hardman,et al.  A survey of packet loss recovery techniques for streaming audio , 1998, IEEE Network.

[50]  Mark B. Sandler,et al.  Theory and Evaluation of a Bayesian Music Structure Extractor , 2005, ISMIR.

[51]  Mary K. Vernon,et al.  Scalable on-demand media streaming with packet loss recovery , 2001, SIGCOMM.

[52]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[53]  Denis Bouchard Foundations of Language: Brain, Meaning, Grammar, Evolution (review) , 2004 .

[54]  Frank Kurth,et al.  Efficient Fault Tolerant Search Techniques for Full-Text Audio Retrieval , 2002 .

[55]  J.A. Bilmes,et al.  Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.

[56]  Michael Good MusicXML: An internet-friendly format for sheet music , 2001 .

[57]  Shu Lin,et al.  Automatic-repeat-request error-control schemes , 1984, IEEE Communications Magazine.

[58]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[59]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[60]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[61]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[62]  Theodore J. Socolofsky,et al.  TCP/IP tutorial , 1991, RFC.

[63]  Hans-Peter Kriegel,et al.  Approximated Clustering of Distributed High-Dimensional Data , 2005, PAKDD.

[64]  Z. Harris,et al.  Foundations of language , 1941 .

[65]  D. J. Hermes,et al.  Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.

[66]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[67]  Soung Chang Liew,et al.  A packet-loss-recovery scheme for continuous-media streaming over the Internet , 2001, IEEE Communications Letters.

[68]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[69]  Geraint A. Wiggins,et al.  Pattern Induction and matching in polyphonic music and other multidimensional datasets , 2001 .

[70]  CurranKevin,et al.  Pattern Matching Techniques for Replacing Missing Sections of Audio Streamed across Wireless Networks , 2015 .

[71]  Ning Hu,et al.  Pattern Discovery Techniques for Music Audio , 2002, ISMIR.

[72]  Lutz Prechelt,et al.  An interface for melody input , 2001, TCHI.

[73]  Anil K. Jain,et al.  A self-organizing network for hyperellipsoidal clustering (HEC) , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[74]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[75]  R. Jackendoff Consciousness and the Computational Mind , 1987 .

[76]  Geraint A. Wiggins Music , syntax , and the meaning of “ meaning ” , 1998 .

[77]  Bingjun Zhang,et al.  CompositeMap: a novel framework for music similarity measure , 2009, SIGIR.

[78]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[79]  Thierry Lecroq,et al.  Handbook of Exact String Matching Algorithms , 2004 .

[80]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[81]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[82]  Yehezkel Lamdan,et al.  Object recognition by affine invariant matching , 2011, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[83]  Geraint A. Wiggins,et al.  Aspects of a Cognitive Theory of Creativity in Musical Composition , 2022 .

[84]  David Stirling,et al.  Performance of MPEG-7 low level audio descriptors with compressed data , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[85]  Ye Wang,et al.  Content-based UEP: a new scheme for packet loss recovery in music streaming , 2003, ACM Multimedia.

[86]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[87]  Benjamin W. Wah,et al.  LSP-based multiple-description coding for real-time low bit-rate voice transmissions , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[88]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[89]  Samuel T. Chanson,et al.  Packet loss probability for bursty wireless real-time traffic through delay model , 2004, IEEE Trans. Veh. Technol..

[90]  Alexander Lerch,et al.  A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .