Spectral Characteristics of Popular Commercial Recordings 1950-2010

In this work, the long-term spectral contours of a large dataset of popular commercial recordings were analyzed. The aim was to analyze overall trends, as well as yearly and genre-specific ones. A novel method for averaging spectral distributions is proposed, which yields results that are prone to comparison. With it, we found out that there is a consistent leaning towards a target equalization curve that stems from practices in the music industry, but also to some extent mimics natural, acoustic spectra of ensembles. For as long as spectral analysis has been a viable tool in the commercial sectors, audio engineers have looked at integrated spectral responses as possible answers for audio quality. Michael Paul Stavrou [1] states that, while at Abbey Road, he lost endless afternoons hopelessly chasing the illusive hit song characteristic in technical parameters and Neil Dorfsman [2] acknowledges that, while many sound engineers would not admit to doing it, he feels that most of them use spectral analysis and comparison to previous work or other commercial work as a standard tool during mixing. In the mixing context, “achieving frequency balance (also referred to as tonal balance) is a prime challenge in most mixes” [3]. Bob Katz [4] proposes that the tonal balance of a symphony orchestra is the ideal reference for the spectral distribution of music. Yet there is no consistent academic study that tackles the question of how generally similar is the spectral response of critically acclaimed tracks, nor has anyone analyzed the surrounding factors upon which it depends. The seminal work in spectrum analysis of musical signals is [5] (in which live signals are used), and it pioneered the 1/3 octave filter bank analysis process that influenced most early studies of the same type. The musical signals were of individual instruments and ensembles in live rooms. McKnight [6] took a 8960 Pestana et al. Spectral Characteristics of Commercial Recordings similar approach in the realm of pre-recorded material but was looking for technical correction measures in the distribution format and used a small dataset. The earliest study that is closest to ours is Bauer’s [7], where the author looked for the average statistical distribution of a small classical dataset. Moller [8] is the only analysis that tries to track down the yearly evolution of spectra. The BBC [9] researched the spectral content of pop music, using custom recordings made for the purpose of the test and [10] focused on the effect of the Compact Disc media on the spectral contour of recordings. Recently, [11] and [12] returned to the subject with a broader dataset, but their analyses focused more on dynamics and panning than frequency response, and their dataset does not follow any objective criteria of popularity. No study relies on a detailed FFT approach as we do, often choosing instead the coarser and more error prone Real Time Analysis (RTA) filter bank approach; nor has any of the aforementioned works tackled a really large representative dataset that follows the idea of commercial popularity, and thus a ‘best-practices’ approach. For our analysis to be consistent with general public preference, we must run it on a dataset that includes the most commercially relevant songs of the time period of interest. We chose to select songs that had been number ones in either the US or the UK charts, found primarily from [13, 14] and Wikipedia. The anglo-saxon bias was considered acceptable as most of the western world’s music industry has a very strong anglo-saxon influence. The list of all the aforementioned singles can be found at [15], a document which also indicates the songs we were able to use. Our dataset is comprised of about half the singles that have been number one over the last 60 years, with a good representation of both genre and year of production (as there were no pilot tests that would allow an estimation of the ideal sample size, we tried, as is customary, to get the largest possible number of observations). All the songs in our dataset are uncompressed and, while we tried to find un-remastered versions, it was not always possible. This means that we are giving extra prominence to current standards of production and the differences we present should be even greater than that which our data suggests. Table 1 shows the number of songs we had available, divided by decade. Years Number of Songs 50s 71 60s 156 70s. 129 80s 193 90s 96 After 2000 127 Total 772 Table 1: Number of songs per decade in the dataset. In Section 2 we will look at the overall average of all the songs in our collection. In Section 3 and 4, the data will be broken down by year and genre respectively and some additional low-level features are introduced to better characterize the differences we are unveiling. Section 5 presents an overview of the present research and presents some viable future directions and applications. The aforementioned accompanying website [15] includes more detailed plots, discussion of remastering, and extended numerical data for the results that have been found in this research. 1. OVERALL AVERAGE SPECTRUM OF COMMERCIAL RECORDINGS Our main analysis focused on the monaural (left+right channel over two), average long-term spectrum of the aforementioned dataset. In order for spectra to be comparable, we first make sure that all songs are sampled at the same frequency (44.1 kHz being the obvious candidate for us, as most works stemmed from CD copies), and that we apply the same window length (4096 samples) to all content, so that the frequency resolution is consistent (≈ 10 Hz). Let: X (k, τ) = (τ+1)wlen−1 ∑ n=τ ·wlen x (n) e−j2πk n N , k = { 0, 1, ..., 2 − 1 } , τ = { 0, 1, ..., ⌊ xlen wlen ⌋} ,