Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression

Abstract Applications such as concatenative synthesis (audio mosaicing) and query-by-example require the ability to search a database using a sound which is qualitatively different from the actual desired result—for example when using vocal queries to retrieve nonvocal sound. Standard query techniques such as nearest neighbours do not account for this difference between source and target; they perform retrieval but do not learn to make timbral analogies. This paper addresses this issue by considering timbral query as a multivariate regression problem from one timbre distribution onto another. We develop a novel variant of multivariate tree regression: given only a set of unlabelled and unpaired samples from two distributions on the same space, the regression learns a cross-associative mapping which assumes general similarities in structure of the two distributions, yet can accommodate differences in shape at various scales. We demonstrate the technique with a synthetic example and with a concatenative synthesizer.

[1]  Stephen McAdams,et al.  A Meta-analysis of Timbre Perception Using Nonlinear Extensions to CLASCAL , 2008, CMMR.

[2]  J. Grey Timbre discrimination in musical patterns. , 1978, The Journal of the Acoustical Society of America.

[3]  Yannis Stylianou,et al.  Perceptual and objective detection of discontinuities in concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  David Deterding,et al.  The Formants of Monophthong Vowels in Standard Southern British English Pronunciation , 1997, Journal of the International Phonetic Association.

[5]  T. Papaioannou Information, Measures of , 2006 .

[6]  Dan Stowell,et al.  Fast Multidimensional Entropy Estimation by $k$-d Partitioning , 2009, IEEE Signal Processing Letters.

[7]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[8]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[9]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[10]  Dan Stowell,et al.  Making music through real-time voice timbre analysis: machine learning and timbral control , 2010 .

[11]  James McCartney,et al.  Rethinking the Computer Music Language: SuperCollider , 2002, Computer Music Journal.

[12]  Bob L. Sturm Adaptive Concatenative Sound Synthesis and Its Application to Micromontage Composition , 2006, Computer Music Journal.

[13]  S. McAdams,et al.  Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones. , 2005, The Journal of the Acoustical Society of America.

[14]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[15]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[16]  Adelaide Figueiredo,et al.  Comparison of tests of uniformity defined on the hypersphere , 2007 .

[17]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[20]  R. Put,et al.  The use of CART and multivariate regression trees for supervised and unsupervised feature selection , 2005 .

[21]  Karl J. Friston,et al.  Variance Components , 2003 .

[22]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[23]  J. W. Gordon,et al.  Perceptual effects of spectral modifications on musical timbres , 1978 .

[24]  Xavier Serra,et al.  Expressive Concatenative Synthesis by Reusing Samples from Real Performance Recordings , 2009, Computer Music Journal.

[25]  S. Lakatos A common perceptual space for harmonic and percussive timbres , 2000, Perception & psychophysics.

[26]  G. Soete,et al.  Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes , 1995, Psychological research.

[27]  Nicola Orio,et al.  Music Retrieval: A Tutorial and Review , 2006, Found. Trends Inf. Retr..

[28]  Geoff Wyvill,et al.  A Smarter Way to Find pitch , 2005, ICMC.

[29]  Michael W. Macon,et al.  A perceptual evaluation of distance measures for concatenative speech synthesis , 1998, ICSLP.